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Preface 


First came the 32-bit SPARC Version 7 (V7) architecture, publicly released in 1987. 
Shortly after, the SPARC V8 architecture was announced and published in book 
form. The 64-bit SPARC V9 architecture was released in 1994. Now, the 
UltraSPARC Architecture specification provides the first significant update in over 
10 years to Sun's SPARC processor architecture. 


What's New? 


For the first time, UltraSPARC Architecture 2005 pulls together in one document all 
parts of the architecture: 


m the nonprivilged (Level 1) architecture from SPARC V9 

m most of the privileged (Level 2) architecture from SPARC V9 

m more in-depth coverage of all SPARC V9 features 

Plus, it includes all of Sun's now-standard architectural extensions (beyond SPARC 


V9), developed through the processor generations of UltraSPARC III, IV, IV+, and 
T1: 


m the VIS!" 1 and VIS 2 instruction set extensions and the associated GSR register 
m multiple levels of global registers, controlled by the GL register 

m Sun's 64-bit MMU architecture 

m privileged instructions ALLCLEAN, OTHERW, NORMALW, and INVALW 


m access to the VER register is now hyperprivileged (and VER was renamed the 
HVER register) 


m the SIR instruction is now hyperprivileged 
m new hyperprivileged instructions RDHPR and WRHPR 
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m the new Hyperprivileged mode 
m Chip-level Multithreading (CMT) architecture 


In addition, architectural features are now tagged with Software Classes and 
Implementation Classes!. Software Classes provide a new, high-level view of the 
expected architectural longevity and portability of software that references those 
features. Implementation Classes give an indication of how efficiently each feature 
is likely to be implemented across current and future UltraSPARC Architecture 
processor implementations. This information provides guidance that should be 
particularly helpful to programmers who write in assembly language or those who 
write tools that generate SPARC instructions. It also provides the infrastructure for 
defining clear procedures for adding and removing features from the architecture 
over time, with minimal software disruption. 
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CHAPTER 1 


Document Overview 





This chapter discusses: 


m Navigating UltraSPARC Architecture 2005 on page 1. 
m Fonts and Notational Conventions on page 2. 
m Reporting Errors in this Specification on page 5. 





1.1 Navigating UltraSPARC Architecture 
2005 


If you are new to the SPARC architecture, read Chapter 3, Architecture Overview, 
study the definitions in Chapter 2, Definitions, then look into the subsequent sections 
and appendixes for more details in areas of interest to you. 


If you are familiar with the SPARC V9 architecture but not UltraSPARC Architecture 
2005, note that UltraSPARC Architecture 2005 conforms to the SPARC V9 Level 1 
architecture (and most of Level 2), with numerous extensions — particularly with 
respect to CMT features, VIS instructions, and support for hyperprivileged-mode 
operation. 

This specfication is structured as follows: 

m Chapter 2, Definitions, which defines key terms used throughout the specification 


m Chapter 3, Architecture Overview, provides an overview of UltraSPARC 
Architecture 2005 


m Chapter 4, Data Formats, describes the supported data formats 
m Chapter 5, Registers, describes the register set 


m Chapter 6, Instruction Set Overview, provides a high-level description of the 
UItraSPARC Architecture 2005 instruction set 





Chapter 7, Instructions, describes the UltraSPARC Architecture 2005 instruction set 
in great detail 


Chapter 8, IEEE Std 754-1985 Requirements for UltraSPARC Architecture 2005, 
describes the trap model 


Chapter 9, Memory describes the supported memory model 


Chapter 10, Address Space Identifiers (ASIs), provides a complete list of supported 
ASIs 


Chapter 11, Performance Instrumentation describes the architecture for performance 
monitoring hardware 


Chapter 12, Traps, describes the trap model 

Chapter 13, Interrupt Handling, describes how interrupts are handled 
Chapter 14, Memory Management, describes MMU operation 

Chapter 15, Chip-Level Multithreading (CMT), describes the new CMT features 


Chapter 16, Resets, describes resets, RED. state, and error state. 





Appendix A, Opcode Maps, provides the overall picture of how the instruction set 
is mapped into opcodes 


Appendix B, Implementation Dependencies, describes all implementation 
dependencies 


Appendix C, Assembly Language Syntax, describes extensions to the SPARC 
assembly language syntax; in particular, synthetic instructions are documented in 
this appendix 


1.2 Fonts and Notational Conventions 


Fonts are used as follows: 


Italic font is used for emphasis, book titles, and the first instance of a word that is 
defined. 


Italic font is also used for terms where substitution is expected, for example, 


Hou 


^£ccn^, "virtual processor n", or "reg plus imm". 
Italic sans serif font is used for exception and trap names. For example, “The 
privileged action exception...” 


lowercase helvetica font is used for register field names (named bits) and 
instruction field names, for example: "The rs1 field contains...” 


UPPERCASE HELVETICA font is used for register names; for example, FSR. 


TYPEWRITER (Courier) font is used for literal values, such as code (assembly 
language, C language, ASI names) and for state names. For example: $£0, 
ASI PRIMARY, execute state. 

















2 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


m When a register field is shown along with its containing register name, they are 
separated by a period (’.’), for example, "FSR.cexc". 


m UPPERCASE words are acronyms or instruction names. Some common acronyms 
appear in the glossary in Chapter 2, Definitions. Note: Names of some instructions 
contain both upper- and lower-case letters. 


m Anunderscore character joins words in register, register field, exception, and trap 
names. Note: Such words may be split across lines at the underbar without an 
intervening hyphen. For example: “This is true whenever the integer condition | 
code field...” 


The following notational conventions are used: 


m The left arrow symbol ( — ) is the assignment operator. For example, 
“PC + PC + 1" means that the Program Counter (PC) is incremented by 1. 


m Square brackets ( [ ] ) are used in two different ways, distinguishable by the 
context in which they are used: 


= Square brackets indicate indexing into an array. For example, TT[TL] means the 
element of the Trap Type (TT) array, as indexed by the contents of the Trap 
Level (TL) register. 


= Square brackets are also used to indicate optional additions/extensions to 
symbol names. For example, "ST[D | Q]F” expands to all three of “STF”, 
“STDF”, and "STOF". Similarly, A31 PRIMARY[ LITTLE] indicates two 
related address space identifiers, ASI PRIMARY and ASI PRIMARY LITTLE. 
(Contrast with the use of angle brackets, below) 








m Angle brackets ( < > ) indicate mandatory additions/extensions to symbol names. 
For example, "ST«D | Q>F” expands to mean "STDF" and “STQF”. (Contrast with 
the second use of square brackets, above) 


m Curly braces ((] ) indicate a bit field within a register or instruction. For example, 
CCR{4} refers to bit 4 in the Condition Code Register. 


m A consecutive set of values is indicated by specifying the upper and lower limit of 
the set separated by a colon ( : ), for example, CCR(3:0] refers to the set of four 
least significant bits of register CCR. (Contrast with the use of double periods, 
below) 


m A double period ( .. ) indicates any single intermediate value between two given 
end values is possible. For example, NAME[2..0] indicates four forms of NAME 
exist: NAME, NAME2, NAMEI, and NAMEO0; whereas NAME<2..0> indicates 
that three forms exist: NAME2, NAME1, and NAMEO. (Contrast with the use of 
the colon, above) 


m A vertical bar ( | ) separates mutually exclusive alternatives inside square 
brackets ( [ ] ), angle brackets ( < > ), or curly braces ( { } ). For example, 
"NAME[A | B]" expands to "NAME, NAMEA, NAMEP" and “NAME<A | B>” 
expands to 'NAMEA, NAMEB". 
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m The asterisk ( * ) is used as a wild card, encompassing the full set of valid values. 
For example, FCMP* refers to FCMP with all valid suffixes (in this case, 
FCMPs<sld1q> and FCMPE<s |d |q>). An asterisk is typically used when the full 
list of valid values either is not worth listing (because it has little or no relevance 
in the given context) or the valid values are too numerous to list in the available 
space. 


m The slash ( / ) is used to separate paired or complementary values in a list, for 
example, "the LDBLOCKF/STBLOCKF instruction pair ...." 


m The double colon (::) is an operator that indicates concatenation (typically, of bit 
vectors). Concatenation strictly strings the specified component values into a 
single longer string, in the order specified. The concatenation operator performs 
no arithmetic operation on any of the component values. 


1.2.1 Implementation Dependencies 


Implementors of UltraSPARC Architecture 2005 processors are allowed to resolve 
some aspects of the architecture in machine-dependent ways. 


The definition of each implementation dependency is indicated by the notation 
“IMPL. DEP. #nn-XX: Some descriptive text". The number nn provides an index into 
the complete list of dependencies in Appendix B, Implementation Dependencies. 


A reference to (but not definition of) an implementation dependency is indicated by 
the notation "(impl. dep. nn)". 


1.2.2 Notation for Numbers 


Numbers throughout this specification are decimal (base-10) unless otherwise 
indicated. Numbers in other bases are followed by a numeric subscript indicating 
their base (for example, 1001;, FFFF 000046). Long binary and hexadecimal numbers 
within the text have spaces inserted every four characters to improve readability. 
Within C language or assembly language examples, numbers may be preceded by 
“Ox” to indicate base-16 (hexadecimal) notation (for example, 0xFFFF0000). 


1:23 Informational Notes 


This guide provides several different types of information in notes, as follows: 


Note | General notes contain incidental information relevant to the 
paragraph preceding the note. 


Programming 
Note 


Programming notes contain incidental information about how 
software can use an architectural feature. 
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Implementation 
Note 


V9 Compatibility 
Note 


Forward 
Compatibility 
Note 








An Implementation Note contains incidental information, 
describing how an UltraSPARC Architecture 2005 processor 
might implement an architectural feature. 


Note containing information about possible differences between 
UItraSPARC Architecture 2005 and SPARC V9 implementations. 
Such information is relevant to UltraSPARC Architecture 2005 
implementations and might not apply to other SPARC V9 
implementations. 


Note containing information about how the UltraSPARC 
Architecture is expected to evolve in the future. Such notes are 
not intended as a guarantee that the architecture will evolve as 
indicated, but as a guide to features that should not be depended 
upon to remain the same, by software intended to run on both 
current and future implementations. 


1.3 Reporting Errors in this Specification 


This specification has been reviewed for completeness and accuracy. Nonetheless, as 
with any document this size, errors and omissions may occur, and reports of such 
are welcome. Please send “bug reports" and other comments on this document to 
the email address: UA-editor@sun.com 
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CHAPTER 2 


Definitions 





This chapter defines concepts and terminology common to all implementations of 
UltraSPARC Architecture 2005. 


address space 


address space identifier 
(ASI) 


aliased 


application program 


ASI 
ASR 
available (virtual 


processor) 


big-endian 


BLD 


BST 


byte 


A range of 264 locations that can be addressed by instruction fetches and load, 
store, or load-store instructions. See also address space identifier (AST). 


An 8-bit value that identifies a particular address space. An ASI is (implicitly 
or explicitly) associated with every instruction access or data access. See also 
implicit ASI. 


Said of each of two virtual or real addresses that refer to the same underlying 
memory location. 


A program executed with the virtual processor in nonprivileged mode. Note: 
Statements made in this specification regarding application programs may not 
be applicable to programs (for example, debuggers) that have access to 
privileged virtual processor state (for example, as stored in a memory-image 
dump). 


Address space identifier. 


Ancillary State register. 


A virtual processor that is physically present and functional, that can be 
enabled and used. 


An addressing convention. Within a multiple-byte integer, the byte with the 
smallest address is the most significant; a byte's significance decreases as its 
address increases. 


(Obsolete) abbreviation for Block Load instruction; replaced by LDBLOCKF. 
(Obsolete) abbreviation for Block Store instruction; replaced by STBLOCKF. 


Eight consecutive bits of data, aligned on an 8-bit boundary. 


CCR 


clean window 


CMT 


coherence 


completed (memory 
operation) 


context 


context ID 


copyback 


CPI 


core 


cross-call 
CTI 


current window 


data access 
(instruction) 


DCTI 
demap 


denormalized 
number 


Abbreviation for Condition Codes Register. 


A register window in which each of the registers contain 0, a valid address 
from the current address space, or valid data from the current address space. 


Chip-level MultiThreading (or, as an adjective, Chip-level MultiThreaded). 
Refers to a physical processor containing more than one virtual processor. 


A set of protocols guaranteeing that all memory accesses are globally visible to 
all caches on a shared-memory bus. 


Said of a memory transaction when an idealized memory has executed the 
transaction with respect to all processors. A load is considered completed 
when no subsequent memory transaction can affect the value returned by the 
load. A store is considered completed when no subsequent load can return the 
value that was overwritten by the store. 


A set of translations that defines a particular address space. See also Memory 
Management Unit (MMU). 


A numeric value that uniquely identifies a particular context. 


The process of sending a copy of the data from a cache line owned by a 
physical processor core, in response to a snoop request from another device. 


Cycles per instruction. The number of clock cycles it takes to execute an 
instruction. 


In an UltraSPARC Architecture processor, may refer to either a virtual 
processor or a physical processor core. 


An interprocessor call in a system containting multiple virtual processors. 
Abbreviation for control-transfer instruction. 


The block of 24 R registers that is presently in use. The Current Window 
Pointer (CWP) register points to the current window. 


A load, store, load-store, or FLUSH instruction. 
Delayed control transfer instruction. 


To invalidate a mapping in the MMU. 


Synonym for subnormal number. 
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deprecated 


disable (core) 


disabled (core) 


doubleword 


D-SFAR 


enable (core) 


enabled (core) 


even parity 


exception 


explicit ASI 


extended word 


fccn 


FGU 


The term applied to an architectural feature (such as an instruction or register) 
for which an UltraSPARC Architecture implementation provides support only 
for compatibility with previous versions of the architecture. Use of a 
deprecated feature must generate correct results but may compromise software 
performance. 


Deprecated features should not be used in new UltraSPARC Architecture 
software and may not be supported in future versions of the architecture. 


The process of changing the state of a virtual processor to Disabled, during 
which all other processor state (including cache coheriency) may be lost and all 
interrupts to that virtual processor will be discarded. See also park and 
enable. 


A virtual processor that is out of operation (not executing instructions, not 
participating in cache coherency, and discarding interrupts). See also parked 
and enabled. 


An 8-byte datum. Note: The definition of this term is architecture dependent 
and may differ from that used in other processor architectures. 


Data Synchronous Fault Address register. 





The process of moving a virtual processor from Disabled to Enabled state 
and preparing it for operation. See also disable and park. 


A virtual processor that is in operation (participating in cache coherency, but 
not executing instructions unless it is also Running). See also disabled and 
running. 


The mode of parity checking in which each combination of data bits plus a 
parity bit contains an even number of ‘1’ bits. 


A condition that makes it impossible for the processor to continue executing 
the current instruction stream. Some exceptions may be masked (that is, trap 
generation disabled — for example, floating-point exceptions masked by 
FSR.tem) so that the decision on whether or not to apply special processing 
can be deferred and made by software at a later time. See also trap. 


An ASI that that is provided by a load, store, or load-store alternate instruction 
(either from its imm_asi field or from the ASI register). 


An 8-byte datum, nominally containing integer data. Note: The definition of 
this term is architecture dependent and may differ from that used in other 
processor architectures. 


One of the floating-point condition code fields fccO, fcc1, fcc2, or fcc3. 


Floating-point and Graphics Unit (which most implementations specify as a 
superset of FPU). 
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floating-point 
exception 


F register 


floating-point operate 
instructions 


floating-point trap 
type 


floating-point unit 


FPop 
FPRS 
FPU 

FSR 

GL 

GSR 
halfword 


hyperprivileged 


hypervisor (software) 


IEEE 754 


An exception that occurs during the execution of a floating-point operate 
(FPop) instruction. The exceptions are unfinished_FPop, unimplemented_FPop, 
sequence error, hardware error, invalid fp register, or IEEE 754 exception. 


A floating-point register. The SPARC V9 architecture includes single-, double-, 
and quad-precision F registers. 


Instructions that perform floating-point calculations, as defined in Floating- 
Point Operate (FPop) Instructions on page 133. FPop instructions do not include 
FBfcc instructions, loads and stores between memory and the F registers, or 
non-floating-point operations that read or write F registers. 


The specific type of a floating-point exception, encoded in the FSR.ftt field. 


A processing unit that contains the floating-point registers and performs 
floating-point operations, as defined by this specification. 


Abbreviation for floating-point operate (instructions). 
Floating-Point Register State register. 

Floating-Point Unit. 

Floating-Point Status register. 

Global Level register. 

General Status register. 


A 2-byte datum. Note: The definition of this term is architecture dependent 
and may differ from that used in other processor architectures. 


An adjective that describes: 

(1) the state of the processor when HPSTATE.hpriv = 1, that is, when the 
processor is in hyperprivileged mode; 

(2) processor state that is only accessible to software while the processor is in 
hyperprivileged mode; for example, hyperprivileged registers, 
hyperprivileged ASRs, or, in general, hyperprivileged state; 

(3) an instruction that can be executed only when the processor is in 
hyperprivileged mode. 


A layer of software that executes in hyperprivileged processor state. One 
purpose of hypervisor software (also referred to as "the hypervisor") is to 
provide greater isolation between operating system ("supervisor") software 
and the underlying processor implementation. 


IEEE Standard 754-1985, the IEEE Standard for Binary Floating-Point 
Arithmetic. 
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IEEE-754 exception 


implementation 


implementation 
dependent 


implicit ASI 


initiated 
instruction field 


instruction group 


instruction set 
architecture 


integer unit 


interrupt request 
inter-strand 
intra-strand 
invalid 

(ASI or address) 
ISA 


issued 


IU 


A floating-point exception, as specified by IEEE Std 754-1985. Listed within 
this specification as IEEE 754 exception. 


Hardware or software that conforms to all of the specifications of an 
instruction set architecture (ISA). 


An aspect of the UltraSPARC Architecture that can legitimately vary among 
implementations. In many cases, the permitted range of variation is specified. 
When a range is specified, compliant implementations must not deviate from 
that range. 


An address space identifier that is implicitly supplied by the virtual processor 
on all instruction accesses and on data accesses that do not explicitly provide 
an ASI value (from either an imm asi instruction field or the ASI register). 


Synonym for issued. 
A bit field within an instruction word. 


One or more independent instructions that can be dispatched for simultaneous 
execution. 


A set that defines instructions, registers, instruction and data memory, the 
effect of executed instructions on the registers and memory, and an algorithm 
for controlling instruction execution. Does not define clock cycle times, cycles 
per instruction, data paths, etc. This specification defines the UltraSPARC 
Architecture 2005 instruction set architecture. 


A processing unit that performs integer and control-flow operations and 
contains general-purpose integer registers and virtual processor state registers, 
as defined by this specification. 


A request for service presented to a virtual processor by an external device. 
Describes an operation that crosses virtual processor (strand) boundaries. 


Describes an operation that occurs entirely within one virtual processor 
(strand). 


Undefined, reserved, or illegal. 
Instruction set architecture. 


A memory transaction (load, store, or atomic load-store) is said to be "issued" 
when a virtual processor has sent the transaction to the memory subsystem 
and the completion of the request is out of the virtual processor's control. 
Synonym for initiated. 


Integer Unit. 
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little-endian 


load 


load-store 


may 


Memory Management 
Unit 


MMU 


multiprocessor 
system 


must 


next program counter 


NFO 


nonfaulting load 


nonprivileged 


An addressing convention. Within a multiple-byte integer, the byte with the 
smallest address is the least significant; a byte's significance increases as its 
address increases. 


An instruction that reads (but does not write) memory or reads (but does not 
write) location(s) in an alternate address space. Some examples of Load 
includes loads into integer or floating-point registers, block loads, and 
alternate address space variants of those instructions. See also load-store and 
store, the definitions of which are mutually exclusive with load. 


An instruction that explicitly both reads and writes memory or explicitly reads 
and writes location(s) in an alternate address space. Load-store includes 
instructions such as CASA, CASXA, LDSTUB, and the deprecated SWAP 
instruction. See also load and store, the definitions of which are mutually 
exclusive with load-store. 


A keyword indicating flexibility of choice with no implied preference. Note: 
“may” indicates that an action or operation is allowed; "can" indicates that it is 
possible. 


The address translation hardware in an UltraSPARC Architecture 
implementation that translates 64-bit virtual address into underlying physical 
addresses. The MMU is composed of the TLBs, ASRs, and ASI registers used 
to manage address translation. See also context, physical address, real 
address, and virtual address. 


Abbreviation for Memory Management Unit. 


A system containing more than one processor. 


A keyword indicating a mandatory requirement. Designers must implement 
all such mandatory requirements to ensure interoperability with other 
UItraSPARC Architecture-compliant products. Synonym for shall. 


Conceptually, a register that contains the address of the instruction to be 
executed next if a trap does not occur. 


Nonfault access only. 


A load operation that behaves identically to a normal load operation, except 
when supplied an invalid effective address by software. In that case, a regular 
load triggers an exception whereas a nonfaulting load appears to ignore the 
exception and loads its destination register with a value of zero (on an 
UItraSPARC Architecture processor, hardware treats regular and nonfaulting 
loads identically; the distinction is made in trap handler software). Contrast 
with speculative load. 


An adjective that describes 
(1) the state of the virtual processor when PSTATE.priv = 0 and 
HPSTATE.hpriv = 0, that is, when it is in nonprivileged mode; 
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nonprivileged mode 


nontranslating ASI 


normal trap 


NPC 

npt 

nucleus software 
NUMA 

N REG WINDOWS 


octlet 


odd parity 


opcode 
optional 
PA 


park 


parked 


PC 
PCR 


(2) virtual processor state information that is accessible to software regardless 
of the current privilege mode; for example, nonprivileged registers, 
nonprivileged ASRs, or, in general, nonprivileged state; 

(3) an instruction that can be executed in any privilege mode (hyperprivileged, 
privileged, or nonprivileged). 


The mode in which a virtual processor is operating when executing application 
software (at the lowest privilege level). Nonprivileged mode is defined by 
PSTATE.priv = 0 and HSTATE.hpriv = 0. See also privileged and 
hyperprivileged. 


An ASI that does not refer to memory (for example, refers to control/status 
register(s)) and for which the MMU does not perform address translation. 





A trap processed in execute. state (or equivalently, a non-RED state 
trap). Contrast with RED state trap. 


Next program counter. 

Nonprivileged trap. 

Privileged software running at a trap level greater than 0 (TL> 0). 
Nonuniform memory access. 

The number of register windows present in a particular implementation. 


Eight bytes (64 bits) of data. Not to be confused with "octet," which has been 
commonly used to describe eight bits of data. In this document, the term byte, 
rather than octet, is used to describe eight bits of data. 


The mode of parity checking in which each combination of data bits plus a 
parity bit together contain an odd number of '1' bits. 


A bit pattern that identifies a particular instruction. 
A feature not required for UltraSPARC Architecture 2005 compliance. 
Physical address. 


The process of suspending a virtual processor from operation. There may be a 
delay until the virtual processor is parked, but no heavyweight operation (such 
as a reset) is required to complete the parking process. See also disable and 
unpark. 


Said of a virtual processor that is suspended from operation. When parked, a 
virtual processor does not issue instructions for execution but still maintains 
cache coherency. See also disabled, enabled, and running. 


Program counter. 


Performance Control register. 
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physical address 


physical core 


physical processor 


PIC 


PIL 


pipeline 


PIPT 


POR 


prefetchable 


privileged 


privileged mode 


An address that maps to actual physical memory or 1/0 device space. See also 
real address and virtual address. 


The term physical processor core, or just physical core, is similar to the term 
pipeline but represents a broader collection of hardware that are required for 
performing the execution of instructions from one or more software threads. 
For a detailed definition of this term, see page 531. See also pipeline, 
processor, strand, thread, and virtual processor. 


Synonym for processor; used when an explicit contrast needs to be drawn 
between processor and virtual processor. See also processor and virtual 
processor. 


Performance Instrumentation Counter. 
Processor Interrupt Level register. 


Refers to an execution pipeline, the basic collection of hardware needed to 
execute instructions. For a detailed definition of this term, see page 531. See 
also physical core, processor, strand, thread, and virtual processor. 


Physically indexed, physically tagged (cache). 
Power-on reset. 


(1) An attribute of a memory location that indicates to an MMU that 
PREFETCH operations to that location may be applied. 

(2) À memory location condition for which the system designer has 
determined that no undesirable effects will occur if a PREFETCH operation to 
that location is allowed to succeed. Typically, normal memory is prefetchable. 


Nonprefetchable locations include those that, when read, change state or cause 
external events to occur. For example, some I/O devices are designed with 
registers that clear on read; others have registers that initiate operations when 
read. See also side effect. 


An adjective that describes: 

(1) the state of the virtual processor when PSTATE.priv = 1 and 
HPSTATE.hpriv = 0,that is, when the virtual processor is in privileged mode; 

(2) processor state that is only accessible to software while the virtual processor 
is in hyperprivileged or privileged mode; for example, privileged registers, 
privileged ASRs, or, in general, privileged state; 

(3) an instruction that can be executed only when the virtual processor is in 
hyperprivileged or privileged mode. 


The mode in which a processor is operating when PSTATE.priv = 1 and 
HPSTATE.hpriv 2 0. See also nonprivileged and hyperprivileged. 
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processor 


processor core 
processor module 
program counter 


quadword 


R register 
RA 

RAS 
RAW 

rd 


real address 


RED state 


RED state trap 


reserved 


The unit on which a shared interface is provided to control the configuration 
and execution of a collection of strands; a physical module that plugs into a 
system. Synonym for processor module. For a detailed definition of this term, 
see page 531. See also pipeline, physical core, strand, thread, and virtual 
processor. 


Synonym for physical core. 
Synonym for processor. 
A register that contains the address of the instruction currently being executed. 


A 16-byte datum. Note: The definition of this term is architecture dependent 
and may be different from that used in other processor architectures. 


An integer register. Also called a general-purpose register or working register. 
Real address. 

Reliability, Availability, and Serviceability 

Read After Write (hazard) 

Rounding direction. 


An address produced by a virtual processor that refers to a particular software- 
visible memory location, as viewed from privileged mode. Virtual addresses 
are usually translated by a combination of hardware and software to real 
addresses, which can be used to access real memory. See also virtual address. 


Reset, Error, and Debug state. The virtual processor state when 
HPSTATE.red - 1. A restricted execution environment used to process resets 
and traps that occur when TL = MAXTL — 1. 





A trap processed in RED. state. Contrast with normal trap. 


Describing an instruction field, certain bit combinations within an instruction 
field, or a register field that is reserved for definition by future versions of the 
architecture. 


A reserved instruction field must read as 0, unless the implementation supports 
extended instructions within the field. The behavior of an UltraSPARC 
Architecture 2005 virtual processor when it encounters a nonzero value in a 
reserved instruction field is as defined in Reserved Opcodes and Instruction Fields 
on page 134. 


A reserved bit combination within an instruction field is defined in Chapter 7, 
Instructions. In all cases, an UltraSPARC Architecture 2005 processor must 
decode and trap on such reserved bit combinations. 


A reserved field within a register reads as 0 in current implementations and, when 
written by software, should always be written with values of that field 
previously read from that register or with the value zero (as described in 
Reserved Register Fields on page 48). 
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reset trap 


restricted 


retired 


RMO 
RTO 
RTS 


running 


service processor 


SFSR 


shall 
should 


side effect 


Throughout this specification, figures and tables illustrating registers and 
instruction encodings indicate reserved fields and reserved bit combinations 
with a wide ("em") dash (—). 


A vectored transfer of control to hyperprivileged software through a fixed- 
address reset trap table. Reset traps cause entry into RED state. 





Describes an address space identifier (ASI) that may be accessed only while the 
virtual processor is operating in privileged or hyperprivileged mode. 


An instruction is said to be "retired" when one of the following two events has 
occurred: 

(1) A precise trap has been taken, with TPC containing the instruction's 
address (the instruction has not changed architectural state in this case). 

(2) The instruction's execution has progressed to a point at which architectural 
state affected by the instruction has been updated such that all three of the 
following are true: 


a The PC has advanced beyond the instruction. 

wm Except for deferred trap handlers, no consumer in the same instruction 
stream can see the old values and all consumers in the same instruction 
stream will see the new values. 

a Stores are visible to all loads in the same instruction stream, including stores 
to noncacheable locations. 


Abbreviation for Relaxed Memory Order (a memory model). 
Read to Own (a type of transaction, used to request ownership of a cache line). 


Read to Share (a type of transaction, used to request read-only access to a 
cache line). 


A state of a virtual processor in which it is in operation (maintaining cache 
coherency and issuing instructions for execution) and not Parked. 


A device external to the processor that can examine and alter internal 
processor state. A service processor may be used to control/coordinate a 
multiprocessor system and aid in error recovery. 


Synchronous Fault Status register. 
Synonym for must. 


A keyword indicating flexibility of choice with a strongly preferred 
implementation. Synonym for it is recommended. 


The result of a memory location having additional actions beyond the reading 
or writing of data. A side effect can occur when a memory operation on that 
location is allowed to succeed. Locations with side effects include those that, 
when accessed, change state or cause external events to occur. For example, 
some I/O devices contain registers that clear on read; others have registers that 
initiate operations when read. See also prefetchable. 
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SIMD 
SIR 


snooping 


speculative load 


store 


strand 


subnormal number 


superscalar 


supervisor software 
suspend 
suspended 


synchronization 


system 


taken 


Single Instruction/Multiple Data; a class of instructions that perform identical 
operations on multiple data contained (or “packed”) in each source operand. 


Software-initiated reset. 


The process of maintaining coherency between caches in a shared-memory bus 
architecture. Each cache controller monitors (snoops) the bus to determine 
whether it needs to copy back or invalidate its copy of each shared cache block. 


A load operation that is issued by a virtual processor speculatively, that is, 
before it is known whether the load will be executed in the flow of the 
program. Speculative accesses are used by hardware to speed program 
execution and are transparent to code. An implementation, through a 
combination of hardware and system software, must nullify speculative loads 
on memory locations that have side effects; otherwise, such accesses produce 
unpredictable results. Contrast with nonfaulting load. 


An instruction that writes (but does not explicitly read) memory or writes (but 
does not explicitly read) location(s) in an alternate address space. Some 
examples of Store includes stores from either integer or floating-point registers, 
block stores, Partial Store, and alternate address space variants of those 
instructions. See also load and load-store, the definitions of which are 
mutually exclusive with store. 


The hardware state that must be maintained in order to execute a software 
thread. For a detailed definition of this term, see page 530. See also pipeline, 
physical core, processor, thread, and virtual processor. 


A nonzero floating-point number, the exponent of which has a value of zero. A 
more complete definition is provided in IEEE Standard 754-1985. 


An implementation that allows several instructions to be issued, executed, and 
committed in one clock cycle. 


Software that executes when the virtual processor is in privileged mode. 
Synonym for park. 
Synonym for parked. 


An operation that causes the processor to wait until the effects of all previous 
instructions are completely visible before any subsequent instructions are 
executed. 


A set of virtual processors that share a common physical address space. 


A control-transfer instruction (CTT) is taken when the CTI writes the target 
address value into NPC. 


A trap is taken when the control flow changes in response to an exception, 
reset, Tcc instruction, or interrupt. An exception must be detected and 
recognized before it can cause a trap to be taken. 
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TBA 


thread 


TLB 

TLB hit 
TLB miss 
TNPC 
TPC 


Translation Lookaside 
Buffer 


trap 


TSB 


TSO 
TTE 


UA-2005 


unassigned 


undefined 


Trap base address. 


A software entity that can be executed on hardware. For a detailed definition 
of this term, see page 530. See also pipeline, physical core, processor, strand, 
and virtual processor. 


Abbreviation for Translation Lookaside Buffer. 
The desired translation is present in the TLB. 
The desired translation is not present in the TLB. 
Trap-saved next program counter. 


Trap-saved program counter. 


A cache within an MMU that contains recently-used Translation Table Entries 
(TTEs). TLBs speed up translations by often eliminating the need to reread 
TTEs from memory. 


The action taken by a virtual processor when it changes the instruction flow in 
response to the presence of an exception, reset, a Tcc instruction, or an 
interrupt. The action is a vectored transfer of control to more-privileged 
software through a table, the address of which is specified by the privileged 
Trap Base Address (TBA) register or the Hyperprivileged Trap Base Address 
(HTBA) register. See also exception. 


Translation storage buffer. A table of the address translations that is 
maintained by software in system memory and that serves as a cache of 
virtual-to-real address mappings. 


Total Store Order (a memory model). 


Translation Table Entry. Describes the virtual-to-real, virtual-to-physical, or 
real-to-physical translation and page attributes for a specific page in the page 
table. In some cases, this term is explicitly used to refer to entries in the TSB. 


UItraSPARC Architecture 2005 


A value (for example, an ASI number), the semantics of which are not 
architecturally mandated and which may be determined independently by 
each implementation within any guidelines given. 


An aspect of the architecture that has deliberately been left unspecified. 
Software should have no expectation of, nor make any assumptions about, an 
undefined feature or behavior. Use of such a feature can deliver unexpected 
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unimplemented 


unpark 


unparked 
unpredictable 
uniprocessor system 


unrestricted 


user application 
program 


VA 


virtual address 


virtual core, 
virtual processor core 


virtual processor 


VIS 
VP 
WDR 


word 


results and may or may not cause a trap. An undefined feature may vary 
among implementations, and may also vary over time on a given 
implementation. 


Notwithstanding any of the above, undefined aspects of the architecture shall 
not cause security holes (such as changing the privilege state or allowing 
circumvention of normal restrictions imposed by the privilege state), put a 
virtual processor into a more-privileged mode, or put the virtual processor into 
an unrecoverable state. 


An architectural feature that is not directly executed in hardware because it is 
optional or is emulated in software. 


The process of bringing a virtual processor out of suspension. There may be a 
delay until the virtual processor is unparked, but no heavyweight operation 
(such as a reset) is required to complete the unparking process. See also 
disable and park. 


Synonym for running. 
Synonym for undefined. 
A system containing a single virtual processor. 


Describes an address space identifier (ASI) that can be used in all privileged 
modes; that is, regardless of the value of PSTATE.priv and HPSTATE.hpriv. 


Synonym for application program. 
Abbreviation for virtual address. 


An address produced by a virtual processor that refers to a particular software- 
visible memory location. Virtual addresses usually are translated by a 
combination of hardware and software to physical addresses, which can be 
used to access physical memory. See also physical address and real address. 


Synonyms for virtual processor. 


The term oirtual processor, or virtual processor core, is used to identify each 
strand in a processor. At any given time, an operating system can have a 
different thread scheduled on each virtual processor. For a detailed definition 
of this term, see page 531. See also pipeline, physical core, processor, strand, 
and thread. 


Abbreviation for VISTM Instruction Set. 

Abbreviation for virtual processor. 

Watchdog reset. 

A 4-byte datum. Note: The definition of this term is architecture dependent 


and may differ from that used in other processor architectures. 
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XIR  Externally initiated reset. 
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CHAPTER 3 


Architecture Overview 





The UltraSPARC Architecture supports 32-bit and 64-bit integer and 32-bit, 64-bit, 
and 128-bit floating-point as its principal data types. The 32-bit and 64-bit floating- 
point types conform to IEEE Std 754-1985. The 128-bit floating-point type conforms 
to IEEE Std 1596.5-1992. The architecture defines general-purpose integer, floating- 
point, and special state/status register instructions, all encoded in 32-bit-wide 
instruction formats. The load /store instructions address a linear, 29^-byte virtual 
address space. 


The UltraSPARC Architecture 2005 specification describes a processor architecture to 
which Sun Microsystem's SPARC processor implementations (beginning with 
UItraSPARC T1) comply. Future implementations are expected to comply with either 
this document or a later revision of this document. 


The UltraSPARC Architecture 2005 is a descendant of the SPARC V9 architecture and 
complies fully with the "Level 1" (nonprivileged) SPARC V9 specification. 


Nonprivileged (application) software that is intended to be portable across all 
SPARC V9 processors should be written to adhere to The SPARC Architecture Manual- 
Version 9. 


Material in this document specific to UltraSPARC Architecture 2005 processors may 
not apply to SPARC V9 processors produced by other vendors. 


In this specification, the word architecture refers to the processor features that are 
visible to an assembly language programmer or to a compiler code generator. It does 
not include details of the implementation that are not visible or easily observable by 
software, nor those that only affect timing (performance). 
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3.1 The UltraSPARC Architecture 2005 


This section briefly describes features, attributes, and components of the 
UltraSPARC Architecture 2005 and, further, describes correct implementation of the 
architecture specification and SPARC V9-compliance levels. 


3.1.1 Features 


The UltraSPARC Architecture 2005, like its ancestor SPARC V9, includes the 
following principal features: 


A linear 64-bit address space with 64-bit addressing. 


32-bit wide instructions — These are aligned on 32-bit boundaries in memory. 
Only load and store instructions access memory and perform I/O. 


Few addressing modes — A memory address is given as either "register + 
register" or “register + immediate". 


Triadic register addresses — Most computational instructions operate on two 
register operands or one register and a constant and place the result in a third 
register. 


A large windowed register file — At any one instant, a program sees 8 global 
integer registers plus a 24-register window of a larger register file. The windowed 
registers can be used as a cache of procedure arguments, local values, and return 
addresses. 


Floating point — The architecture provides an IEEE 754-compatible floating- 
point instruction set, operating on a separate register file that provides 32 single- 
precision (32-bit), 32 double-precision (64-bit), and 16 quad-precision (128-bit) 
overlayed registers. 


Fast trap handlers — Traps are vectored through a table. 


Multiprocessor synchronization instructions — Multiple variations of atomic 
load-store memory operations are supported. 


Predicted branches — The branch with prediction instructions allows the 
compiler or assembly language programmer to give the hardware a hint about 
whether a branch will be taken. 


Branch elimination instructions — Several instructions can be used to eliminate 
branches altogether (for example, Move on Condition). Eliminating branches 
increases performance in superscalar and superpipelined implementations. 


Hardware trap stack — A hardware trap stack is provided to allow nested traps. 
It contains all of the machine state necessary to return to the previous trap level. 
The trap stack makes the handling of faults and error conditions simpler, faster, 
and safer. 
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3.1.2 


In addition, UltraSPARC Architecture 2005 includes the following features that were 
not present in the SPARC V9 specification: 


m Hyperprivileged mode, which simplifies porting of operating systems, supports 
far greater portability of operating system (privileged) software, supports the 
ability to run multiple simultaneous guest operating systems, and provides more 
robust handling of error conditions. 


a Multiple levels of global registers — Instead of the two 8-register sets of global 
registers specified in the SPARC V9 architecture, UltraSPARC Architecture 2005 
provides multiple sets; typically, one set is used at each trap level. 


m Extended instruction set — UltraSPARC Architecture 2005 provides many 
instruction set extensions, including the VIS instruction set for "vector" (SIMD) 
data operations. 


m More detailed, specific instruction descriptions — UltraSPARC Architecture 
2005 provides many more details regarding what exceptions can be generated by 
each instruction and the specific conditions under which those exceptions can 
occur. Also, detailed lists of valid ASIs are provided for each load /store 
instruction from/to alternate space. 


m Detailed MMU architecture — Although some details of the UltraSPARC MMU 
architecture are necessarily implementation-specifc, UltraSPARC Architecture 
2005 provides a blueprint for the UltraSPARC MMU, including software view 
(TTEs and TSBs) and MMU hardware control registers. 


m Chip-Level Multithreading (CMT) — UltraSPARC Architecture 2005 provides a 
control architecture for highly-threaded processor implementations. 


Attributes 


UltraSPARC Architecture 2005 is a processor instruction set architecture (ISA) derived 
from SPARC V8 and SPARC V9, which in turn come from a reduced instruction set 
computer (RISC) lineage. As an architecture, UltraSPARC Architecture 2005 allows 
for a spectrum of processor and system implementations at a variety of price/ 
performance points for a range of applications, including scientific/engineering, 
programming, real-time, and commercial applications. 


3.1.2.1 Design Goals 


The UltraSPARC Architecture 2005 architecture is designed to be a target for 
optimizing compilers and high-performance hardware implementations. This 
specification documents the UltraSPARC Architecture 2005 and provides a design 
spec against which an implementation can be verified, using appropriate verification 
software. 
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3.1.3 


3.1.2.2 Register Windows 


The UltraSPARC Architecture 2005 architecture is derived from the SPARC 
architecture, which was formulated at Sun Microsystems in 1984 through1987. The 
SPARC architecture is, in turn, based on the RISC I and II designs engineered at the 
University of California at Berkeley from 1980 through 1982. The SPARC “register 
window” architecture, pioneered in the UC Berkeley designs, allows for 
straightforward, high-performance compilers and a reduction in memory load/store 
instructions. 


Note that privileged software, not user programs, manages the register windows. 
Privileged software can save a minimum number of registers (approximately 24) 
during a context switch, thereby optimizing context-switch latency. 


System Components 


The UltraSPARC Architecture 2005 allows for a spectrum of subarchitectures, such 
as cache system, I/O, and memory management unit (MMU). 


3.1.3.1 Binary Compatibility 


The most important mandate for the UltraSPARC Architecture is compatibility 
across implementations of the architecture for application (nonprivileged) software, 
down to the binary level. Binaries executed in nonprivileged mode should behave 
identically on all UltraSPARC Architecture systems when those systems are running 
an operating system known to provide a standard execution environment. One 
example of such a standard environment is the SPARC V9 Application Binary 
Interface (ABI). 


Although different UltraSPARC Architecture 2005 systems can execute 
nonprivileged programs at different rates, they will generate the same results as long 
as they are run under the same memory model. See Chapter 9, Memory, for more 
information. 


Additionally, UltraSPARC Architecture 2005 is binary upward-compatible from 
SPARC V9 for applications running in nonprivileged mode that conform to the 
SPARC V9 ABI and upward-compatible from SPARC V8 for applications running in 
nonprivileged mode that conform to the SPARC V8 ABI. 


3.1.3.2 UltraSPARC Architecture 2005 MMU 


Although the SPARC V9 architecture allows its implementations freedom in their 
MMU designs, UltraSPARC Architecture 2005 defines a common MMU architecture 
(see Chapter 14, Memory Management) with some specifics left to implementations 
(see processor implementation documents). 
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3.1.4 


3.1.5 


3.1.6 


3.1.8.3 Privileged Software 


UltraSPARC Architecture 2005 does not assume that all implementations must 
execute identical privileged software (operating systems) or hyperprivileged 
software (hypervisors). Thus, certain traits that are visible to privileged software 
may be tailored to the requirements of the system. 


Architectural Definition 


The UItraSPARC Architecture 2005 is defined by the chapters and appendixes of this 
specification. A correct implementation of the architecture interprets a program 
strictly according to the rules and algorithms specified in the chapters and 
appendixes. 


UltraSPARC Architecture 2005 defines a set of implementations that conform to the 
SPARC V? architecture, Level 1. 


UItraSPARC Architecture 2005 Compliance with 
SPARC V9 Architecture 


UltraSPARC Architecture 2005 fully complies with SPARC V9 Level 1 
(nonprivileged). It partially complies with SPARC V9 Level 2 (privileged). 


Implementation Compliance with UltraSPARC 
Architecture 2005 


Compliant implementations must not add to or deviate from this standard except in 
aspects described as implementation dependent. Appendix B, Implementation 
Dependencies lists all UltraSPARC Architecture 2005, SPARC V9, and SPARC V8 
implementation dependencies. Documents for specific UltraSPARC Architecture 
2005 processor implementations describe the manner in which implementation 
dependencies have been resolved in those implementations. 


IMPL. DEP. #1-V8: Whether an instruction complies with UltraSPARC Architecture 
2005 by being implemented directly by hardware, simulated by software, or 
emulated by firmware is implementation dependent. 
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3.2.1 


BAe 


Processor Architecture 


An UltraSPARC Architecture processor logically consists of an integer unit (IU) and 
a floating-point unit (FPU), each with its own registers. This organization allows for 
implementations with concurrent integer and floating-point instruction execution. 
Integer registers are 64 bits wide; floating-point registers are 32, 64, or 128 bits wide. 
Instruction operands are single registers, register pairs, register quadruples, or 
immediate constants. 


An UltraSPARC Architecture virtual processor can run in nonprivileged mode, 
privileged mode, or hyperprivileged mode. In hyperprivileged mode, the processor can 
execute any instruction, including privileged instructions. In privileged mode, the 
processor can execute nonprivileged and privileged instructions. In nonprivileged 
mode, the processor can only execute nonprivileged instructions. In nonprivileged 
or privileged mode, an attempt to execute an instruction requiring greater privilege 
than the current mode causes a trap to hyperprivileged software. 


Integer Unit (IU) 


An UltraSPARC Architecture 2005 implementation’s integer unit contains the 
general-purpose registers and controls the overall operation of the virtual processor. 
The IU executes the integer arithmetic instructions and computes memory addresses 
for loads and stores. It also maintains the program counters and controls instruction 
execution for the FPU. 


IMPL. DEP. #2-V8: An UltraSPARC Architecture implementation may contain from 
72 to 640 general-purpose 64-bit R registers. This corresponds to a grouping of the 
registers into MAXGL + 1 sets of global R registers plus a circular stack of 
N_REG_WINDOWS sets of 16 registers each, known as register windows. The number 
of register windows present (N_REG_WINDOWS) is implementation dependent, within 
the range of 3 to 32 (inclusive). 


Floating-Point Unit (FPU) 


An UltraSPARC Architecture 2005 implementation’s FPU has thirty-two 32-bit 
(single-precision) floating-point registers, thirty-two 64-bit (double-precision) 
floating-point registers, and sixteen 128-bit (quad-precision) floating-point registers, 
some of which overlap. 


If no FPU is present, then it appears to software as if the FPU is permanently 
disabled. 
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If the FPU is not enabled, then an attempt to execute a floating-point instruction 
generates an fp disabled trap and the fp disabled trap handler software must either 


m Enable the FPU (if present) and reexecute the trapping instruction, or 
m Emulate the trapping instruction in software. 





3.9 


3.3.1 


Instructions 


Instructions fall into the following basic categories: 


Memory access 

Integer arithmetic / logical / shift 

Control transfer 

State register access 

Floating-point operate 

Conditional move 

Register window management 

SIMD (single instruction, multiple data) instructions 


These classes are discussed in the following subsections. 


Memory Access 


Load, store, load-store, and PREFETCH instructions are the only instructions that 
access memory. They use two R registers or an R register and a signed 13-bit 
immediate value to calculate a 64-bit, byte-aligned memory address. The Integer 
Unit appends an ASI to this address. 


The destination field of the load /store instruction specifies either one or two R 
registers or one, two, or four F registers that supply the data for a store or that 
receive the data from a load. 


Integer load and store instructions support byte, halfword (16-bit), word (32-bit), 
and extended-word (64-bit) accesses. There are versions of integer load instructions 
that perform either sign-extension or zero-extension on 8-bit, 16-bit, and 32-bit 
values as they are loaded into a 64-bit destination register. Floating-point load and 
store instructions support word, doubleword, and quadword! memory accesses. 


CASA, CASXA, and LDSTUB are special atomic memory access instructions that 
concurrent processes use for synchronization and memory updates. 


Note | The SWAP instruction is also specified, but it is deprecated and 
should not be used in newly developed software. 


1- No UltraSPARC Architecture processor currently implements the LDQF instruction in hardware; it generates 
an exception and is emulated in hyperprivileged software. 
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The (nonportable) LDTXA instruction supplies an atomic 128-bit (16-byte) load that 
is important in certain system software applications. 


3.3.1.1 Memory Alignment Restrictions 


A memory access on an UltraSPARC Architecture virtual processor must typically be 
aligned on an address boundary greater than or equal to the size of the datum being 
accessed. An improperly aligned address in a load, store, or load-store in instruction 
may trigger an exception and cause a subsequent trap. For details, see Memory 
Alienment Restrictions on page 116. 


3.3.1.2 Addressing Conventions 


The UltraSPARC Architecture uses big-endian byte order by default: the address of a 
quadword, doubleword, word, or halfword is the address of its most significant 
byte. Increasing the address means decreasing the significance of the unit being 
accessed. All instruction accesses are performed using big-endian byte order. 


The UltraSPARC Architecture also supports little-endian byte order for data accesses 
only: the address of a quadword, doubleword, word, or halfword is the address of 
its least significant byte. Increasing the address means increasing the significance of 
the data unit being accessed. 


Addressing conventions are illustrated in FIGURE 62 on page 119 and FIGURE 6-3 on 
page 121. 


3.8.1.3 Addressing Range 


IMPL. DEP. #405-S10: An UltraSPARC Architecture implementation may support a 
full 64-bit virtual address space or a more limited range of virtual addresses. In an 
implementation that does not support a full 64-bit virtual address space, the 
supported range of virtual addresses is restricted to two equal-sized ranges at the 
extreme upper and lower ends of 64-bit addresses; that is, for n-bit virtual addresses, 
the valid address ranges are 0 to 2/71 — 1 and 294 — 2"! to 264 — 1, 


3.3.1.4  Load/Store Alternate 


Versions of load/store instructions, the load/store alternate instructions, can specify an 
arbitrary 8-bit address space identifier for the load/store data access. 

Access to alternate spaces 0016-2F16 is restricted to privileged and hyperprivileged 
software, access to alternate spaces 3016-7F16 is restricted to hyperprivileged 
software, and access to alternate spaces 8016-FF16 is unrestricted. Some of the ASIs 
are available for implementation-dependent uses. Privileged and hyperprivileged 
software can use the implementation-dependent ASIs to access special protected 
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registers, such as MMU control registers, cache control registers, virtual processor 
state registers, and other processor-dependent or system-dependent values. See 
Address Space Identifiers (ASIs) on page 122 for more information. 


Alternate space addressing is also provided for the atomic memory access 
instructions LDSTUBA, CASA, and CASXA. 


Note | The SWAPA instruction is also specified, but it is deprecated and 
should not be used in newly developed software. 


3.3.15 Separate Instruction and Data Memories 


The interpretation of addresses can be unified, in which case the same translations 
and caching are applied to both instructions and data. Alternatively, addresses can 
be "split", in which case instruction references use one caching and translation 
mechanism and data references use another, although the same underlying main 
memory is shared. 


In such split-memory systems, the coherency mechanism may be split, so a write! 
into data memory is not immediately reflected in instruction memory. For this 
reason, programs that modify their own instruction stream (self-modifying code?) 
and that wish to be portable across all UltraSPARC Architecture (and SPARC V9) 
processors must issue FLUSH instructions, or a system call with a similar effect, to 
bring the instruction and data caches into a consistent state. 


An UltraSPARC Architecture virtual processor may or may not have coherent 
instruction and data caches. Even if an implementation does have coherent 
instruction and data caches, a FLUSH instruction is required for self-modifying code 
— not for cache coherency, but to flush pipeline instruction buffers that contain 
unmodified instructions which may have been subsequently modified. 


3.3.1.6 Input/Output (I/O) 


The UltraSPARC Architecture assumes that input/output registers are accessed 
through load/store alternate instructions, normal load/store instructions, or read / 
write Ancillary State Register instructions (RDasr, WRasr). 


IMPL. DEP. #123-V9: The semantic effect of accessing input/output (I/O) locations 
is implementation dependent. 


IMPL. DEP. #6-V8: Whether the I/O registers can be accessed by nonprivileged code 
is implementation dependent. 


IMPL. DEP. #7-V8: The addresses and contents of I/O registers are implementation 
dependent. 


1. this includes use of store instructions (executed on the same or another virtual processor) that write to 
instruction memory, or any other means of writing into instruction memory (for example, DMA) 


2. this is practiced, for example, by software such as debuggers and dynamic linkers 
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5:20 


3.3.1.7 Memory Synchronization 


Two instructions are used for synchronization of memory operations: FLUSH and 
MEMBAR. Their operation is explained in Flush Instruction Memory on page 188 and 
Memory Barrier on page 275, respectively. 


Note | STBAR is also available, but it is deprecated and should not be 
| used in newly developed software. 


Integer Arithmetic / Logical / Shift Instructions 


The arithmetic/logical/shift instructions perform arithmetic, tagged arithmetic, 
logical, and shift operations. With one exception, these instructions compute a result 
that is a function of two source operands; the result is either written into a 
destination register or discarded. The exception, SETHI, can be used in combination 
with other arithmetic and/or logical instructions to create a constant in an R register. 


Shift instructions shift the contents of an R register left or right by a given number of 
bits ("shift count"). The shift distance is specified by a constant in the instruction or 
by the contents of an R register. 


Control Transfer 


Control-transfer instructions (CTIs) include PC-relative branches and calls, register- 
indirect jumps, and conditional traps. Most of the control-transfer instructions are 
delayed; that is, the instruction immediately following a control-transfer instruction 
in logical sequence is dispatched before the control transfer to the target address is 
completed. Note that the next instruction in logical sequence may not be the 
instruction following the control-transfer instruction in memory. 


The instruction following a delayed control-transfer instruction is called a delay 
instruction. Setting the annul bit in a conditional delayed control-transfer instruction 
causes the delay instruction to be annulled (that is, to have no effect) if and only if 
the branch is not taken. Setting the annul bit in an unconditional delayed control- 
transfer instruction ("branch always") causes the delay instruction to be always 
annulled. 


Note | The SPARC V8 architecture specified that the delay instruction 
was always fetched, even if annulled, and that an annulled 
instruction could not cause any traps. The SPARC V9 
architecture does not require the delay instruction to be fetched 
if it is annulled. 


Branch and CALL instructions use PC-relative displacements. The jump and link 
(JMPL) and return (RETURN) instructions use a register-indirect target address. 
They compute their target addresses either as the sum of two R registers or as the 
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3.3.4 


sum of an R register and a 13-bit signed immediate value. The “branch on condition 
codes without prediction" instruction provides a displacement of +8 Mbytes; the 
“branch on condition codes with prediction" instruction provides a displacement of 
+1 Mbyte; the "branch on register contents" instruction provides a displacement of 
+128 Kbytes; and the CALL instruction's 30-bit word displacement allows a control 
transfer to any address within + 2 gigabytes (+ 2?! bytes). 


Note | The return from privileged trap instructions (DONE and 
RETRY) get their target address from the appropriate TPC or 
TNPC register. 


State Register Access 


3.3.4.1 Ancillary State Registers 


The read and write ancillary state register instructions read and write the contents of 
ancillary state registers visible to nonprivileged software (Y, CCR, ASI, PC, TICK, 
and FPRS) and some registers visible only to privileged and hyperprivileged 
software (PCR, SOFTINT, TICK CMPR, and STICK CMPR). 


IMPL. DEP. #8-V8-Cs20: Ancillary state registers (ASRs) in the range 0-27 that are 
not defined in UltraSPARC Architecture 2005 are reserved for future architectural 
use. ASRs in the range 28-31 are available to be used for implementation-dependent 
purposes. 


IMPL. DEP. #9-V8-Cs20: The privilege level required to execute each of the 
implementation-dependent read/write ancillary state register instructions (for ASRs 
28-31) is implementation dependent. 


3.3.4.2 PR State Registers 


The read and write privileged register instructions (RDPR and WRPR) read and 
write the contents of state registers visible only to privileged and hyperprivileged 
software (TPC, TNPC, TSTATE, TT, TICK, TBA, PSTATE, TL, PIL, CWP, CANSAVE, 
CANRESTORE, CLEANWIN, OTHERWIN, and WSTATE). 


3.8.43 HPR State Registers 


The read and write hyperprivileged register instructions (RDHPR and WRHPR) read 
and write the contents of state registers visible only to hyperprivileged software 
(HPSTATE, HTSTATE, HINTP, HVER, and HSTICK CMPR). 
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3.3.6 


3.9.7 


3.3.8 


Floating-Point Operate 


Floating-point operate (FPop) instructions perform all floating-point calculations; 
they are register-to-register instructions that operate on the floating-point registers. 
FPops compute a result that is a function of one or two source operands. The groups 
of instructions that are considered FPops are listed in Floating-Point Operate (FPop) 
Instructions on page 133. 


Conditional Move 


Conditional move instructions conditionally copy a value from a source register to a 
destination register, depending on an integer or floating-point condition code or on 
the contents of an integer register. These instructions can be used to reduce the 
number of branches in software. 


Register Window Management 


Register window instructions manage the register windows. SAVE and RESTORE 
are nonprivileged and cause a register window to be pushed or popped. FLUSHW is 
nonprivileged and causes all of the windows except the current one to be flushed to 
memory. SAVED and RESTORED are used by privileged software to end a window 
spill or fill trap handler. 


SIMD 


UltraSPARC Architecture 2005 includes SIMD (single instruction, multiple data) 
instructions, also known as "vector" instructions, which allow a single instruction to 
perform the same operation on multiple data items, totalling 64 bits, such as eight 8- 
bit, four 16-bit, or two 32-bit data items. These operations are part of the “VIS” 
extensions. 





3.4 


Traps 


A trap is a vectored transfer of control to privileged or hyperprivileged software 
through a trap table that may contain the first 8 instructions (32 for some frequently 
used traps) of each trap handler. The base address of the table is established by 
software in a state register (the Trap Base Address register, TBA, or the 
Hyperprivileged Trap Base Register, HTBA). The displacement within the table is 
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encoded in the type number of each trap and the level of the trap. Part of the trap 
table is reserved for hardware traps, and part of it is reserved for software traps 
generated by trap (Tcc) instructions. 


A trap causes the current PC and NPC to be saved in the TPC and TNPC registers. 
It also causes the CCR, ASI, PSTATE, and CWP registers to be saved in TSTATE. 
TPC, TNPC, and TSTATE are entries in a hardware trap stack, where the number of 
entries in the trap stack is equal to the number of supported trap levels. A trap 
causes hyperprivileged state to be saved in the HTSTATE trap stack. A trap also sets 
bits in the PSTATE (and, in some cases, HPSTATE) register and typically increments 
the GL register. Normally, the CWP is not changed by a trap; on a window spill or 
fill trap, however, the CWP is changed to point to the register window to be saved or 
restored. 


A trap can be caused by a Tcc instruction, an asynchronous exception, an instruction- 
induced exception, or an interrupt request not directly related to a particular 
instruction. Before executing each instruction, a virtual processor determines if there 
are any pending exceptions or interrupt requests. If any are pending, the virtual 
processor selects the highest-priority exception or interrupt request and causes a 
trap. 


See Chapter 12, Traps, for a complete description of traps. 





3.9 


Chip-Level Multithreading (CMT) 


An UltraSPARC Architecture implementation may include multiple virtual processor 
cores on the same processor module to provide a dense, high-throughput system. 
This may be achieved by having a combination of multiple physical processor cores 
and/or multiple strands (threads) per physical processor core, referred to as chip- 
level multithreaded (CMT) processors. CMT-specific hyperprivileged registers are 
used for identification and configuration of CMT processors. 


The CMT programming model describes a common interface between hardware 
(CMT registers) and software 


The common CMT registers and the CMT programming model are described in 
Chapter 15, Chip-Level Multithreading (CMT). 
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CHAPTER 4 


Data Formats 





The UltraSPARC Architecture recognizes these fundamental data types: 

m Signed integer: 8, 16, 32, and 64 bits 

m Unsigned integer: 8, 16, 32, and 64 bits 

m SIMD data formats: Uint8 SIMD (32 bits), Int16 SIMD (64 bits), and Int32 SIMD 
(64 bits) 

m Floating point: 32, 64, and 128 bits 


The widths of the data types are as follows: 

Byte: 8 bits 

Halfword: 16 bits 

Word: 32 bits 

Tagged word: 32 bits (30-bit value plus 2-bit tag) 
Doubleword/Extended-word: 64 bits 
Quadword: 128 bits 


The signed integer values are stored as two's-complement numbers with a width 
commensurate with their range. Unsigned integer values, bit vectors, Boolean 
values, character strings, and other values representable in binary form are stored as 
unsigned integers with a width commensurate with their range. The floating-point 
formats conform to the IEEE Standard for Binary Floating-point Arithmetic, IEEE 
Std 754-1985. In tagged words, the least significant two bits are treated as a tag; the 
remaining 30 bits are treated as a signed integer. 


Data formats are described in these sections: 
m Integer Data Formats on page 36. 

m Floating-Point Data Formats on page 40. 
m SIMD Data Formats on page 43. 


Names are assigned to individual subwords of the multiword data formats as 
described in these sections: 

m Signed Integer Doubleword (64 bits) on page 37. 

m Unsigned Integer Doubleword (64 bits) on page 39. 

m Floating Point, Double Precision (64 bits) on page 41. 

m Floating Point, Quad Precision (128 bits) on page 42. 
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4.1 Integer Data Formats 


TABLE 4-1 describes the width and ranges of the signed, unsigned, and tagged integer 
data formats. 


TABLE 4-1 Signed Integer, Unsigned Integer, and Tagged Format Ranges 








Width 
Data Type (bits) Range 
Signed integer byte 8 -2 to 2 -1 
Signed integer halfword 16 -215 to 215 -1 
Signed integer word 32 —231 to 231-1 
Signed integer doubleword/extended-word 64 -26 to 265 — 1 
Unsigned integer byte 8 0to28-1 
Unsigned integer halfword 16 Oto216-1 
Unsigned integer word 32 0 to 22 -1 
Unsigned integer doubleword /extended-word 64 0 to 28 - 1 
Integer tagged word 32 0 to 220 - 1 


TABLE 4-2 describes the memory and register alignment for multiword integer data. 
All registers in the integer register file are 64 bits wide, but can be used to contain 
smaller (narrower) data sizes. Note that there is no difference between integer 
extended-words and doublewords in memory; the only difference is how they are 
represented in registers. 


TABLE 4-2 Integer Doubleword / Extended-word Alignment 


Memory Address Register Number 
Subformat Required Address Required Register 
Name Subformat Field Alignment (big-endian)! Alignment Number 
SD-0 signed_dbl_integer{63:32} n mod 8=0 n rmod2=0 
SD-1 signed_dbl_integer{31:0} (n+4)mod8=4 n+4 (r+ 1) mod2=1 





SX signed ext integer(63:0] n mod 8 = 0 n — 

UD-0 unsigned dbl integer(63:32] n mod 8 = 0 n rmod2=0 

UD-1 unsigned dbl integer(31:0] (n+4)mod8=4 n+4 (r + 1) mod 2 =1 
UX unsigned_ext_integer{63:0} n mod 8 = 0 n — r 














1. The Memory Address in this table applies to big-endian memory accesses. Word and byte order are reversed when little-endian access- 
es are used. 
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4.1.1 


The data types are illustrated in the following subsections. 


Signed Integer Data Types 


Figures in this section illustrate the following signed data types: 


Signed integer byte 

Signed integer halfword 
Signed integer word 

Signed integer doubleword 
Signed integer extended-word 


4.1.1.4 Signed Integer Byte, Halfword, and Word 


FIGURE 4-1 illustrates the signed integer byte, halfword, and word data formats. 














7 6 0 
SH |S 
1514 0 
SW 3S 
31 30 0 


FIGURE 4-1 Signed Integer Byte, Halfword, and Word Data Formats 


41.1.2 Signed Integer Doubleword (64 bits) 


FIGURE 4-2 illustrates both components (SD-0 and SD-1) of the signed integer double 


data format. 
SD-0 |S signed int doubleword {62:32} 








31 30 0 
SD-1 signed int doubleword(31:0] 
31 0 


FIGURE 4-2. Signed Integer Double Data Format 


CHAPTER 4 * Data Formats 37 


4.1.1.3 Signed Integer Extended-Word (64 bits) 


FIGURE 4-3 illustrates the signed integer extended-word (SX) data format. 


SX Js signed int extended 


63 62 0 





FIGURE 4-3 Signed Integer Extended-Word Data Format 


4.1.2 Unsigned Integer Data Types 


Figures in this section illustrate the following unsigned data types: 


m Unsigned integer byte 
m Unsigned integer halfword 

m Unsigned integer word 

m Unsigned integer doubleword 

m Unsigned integer extended-word 


4.1.2.1 Unsigned Integer Byte, Halfword, and Word 


FIGURE 4-4 illustrates the unsigned integer byte data format. 











7 0 
UH 
15 0 
UW 
81 0 


FIGURE 4-4 Unsigned Integer Byte, Halfword, and Word Data Formats 
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4.1.2.2 Unsigned Integer Doubleword (64 bits) 


FIGURE 4-5 illustrates both components (UD-0 and UD-1) of the unsigned integer 


double data format. 
UD-0 unsigned_int_doubleword {63:32} 








31 0 
UD-1 unsigned int doubleword(31:0] 
31 0 


FIGURE 4-5 Unsigned Integer Double Data Format 


4.1.2.3 Unsigned Extended Integer (64 bits) 


FIGURE 4-6 illustrates the unsigned extended integer (UX) data format. 


UX unsigned int extended 


63 0 





FIGURE 4-6 Unsigned Extended Integer Data Format 


4.1.3 Tagged Word (32 bits) 


FIGURE 4-7 illustrates the tagged word data format. 





TW tag 








al 2 1 0 


FIGURE 4-7 Tagged Word Data Format 
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4.2 


4.2.1 


Floating-Point Data Formats 


Single-precision, double-precision, and quad-precision floating-point data types are 
described below. 


Floating Point, Single Precision (32 bits) 


FIGURE 4-8 illustrates the floating-point single-precision data format, and TABLE 4-3 
describes the formats. 


FS |S! exp{7:0} fraction {22:0} 


31 30 23 22 0 





FIGURE 4-8 Floating-Point Single-Precision Data Format 


TABLE4-3 Floating-Point Single-Precision Format Definition 





s =sign (1 bit) 

e = biased exponent (8 bits) 
f = fraction (23 bits) 

u = undefined 





Normalized value (0 « e « 255): (-1)8§ x 297127 x 1.f 
Subnormal value (e = 0): (Cy x 27126 x O.f 
Zero (e =0,f =0) (-1}$ x 0 
Signalling NaN s =u;e = 255 (max); f = Ouu--uu 
(At least one bit of the fraction must be nonzero) 
Quiet NaN S =u;e = 255 (max); f =.luu--uu 
— co (negative infinity) s =1;e = 255 (max); f = .000--00 
+ œ (positive infinity) s =0;e = 255 (max); f =.000--00 





40 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


4.22 


Floating Point, Double Precision (64 bits) 


FIGURE 4-9 illustrates both components (FD-0 and FD-1) of the floating-point double- 
precision data format, and TABLE 4-4 describes the formats. 


FD-0 |S exp{10:0} fraction {51:32} 








31 30 2019 0 
FD-1 fraction{31:0} 
31 0 


FIGURE 4-9 Floating-Point Double-Precision Data Format 


TABLE 4-4 Floating-Point Double-Precision Format Definition 





s = sign (1 bit) 

e = biased exponent (11 bits) 
f = fraction (52 bits) 

u = undefined 





Normalized value (0 « e « 2047): (-1)§ x 2971028 x 1 £ 
Subnormal value (e - 0): (-1)8 x 271022 x O.f 
Zero (e =0,f =0) (-15 x0 
Signalling NaN s =u;e - 2047 (max); f = Ouu--uu 
(At least one bit of the fraction must be nonzero) 
Quiet NaN s =u;e - 2047 (max); f =.luu--uu 
— œ (negative infinity) s =1;e - 2047 (max); f = .000--00 
+ œ (positive infinity) s =0;e = 2047 (max); f = .000--00 
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4.2.3 Floating Point, Quad Precision (128 bits) 


FIGURE 4-10 illustrates all four components (FQ-0 through FQ-3) of the floating-point 
quad-precision data format, and TABLE 4-5 describes the formats. 


























FQ-0 S exp{14:0} | fraction{111:96} 

31 30 1615 0 
FQ-1 fraction{95:64} 

81 0 
FQ-2 fraction{63:32} 

81 0 
FQ-3 fraction {31:0} 

81 0 


FIGURE 4-10 Floating-Point Quad-Precision Data Format 


TABLE 4-5 Floating-Point Quad-Precision Format Definition 





s = sign (1 bit) 

e = biased exponent (15 bits) 
f = fraction (112 bits) 

u = undefined 





Normalized value (0 « e « 32767): (-1)8 x 28716383 x 1.f 
Subnormal value (e - 0): (19 x 271682 x O.f 
Zero (e =0,f =0) (15 x 0 
Signalling NaN s =u;e = 32767 (max); f = Ouu--uu 
(At least one bit of the fraction must be nonzero) 
Quiet NaN s =u;e = 32767 (max); f =.luu--uu 
— co (negative infinity) s =1;e = 32767 (max); f = .000--00 
+ œ (positive infinity) s =0;e = 32767 (max); f = .000--00 
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4.2.4 


Floating-Point Data Alignment in Memory and 
Registers 


TABLE 4-6 describes the address and memory alignment for floating-point data. 











TABLE 4-6 Floating-Point Doubleword and Quadword Alignment 

Memory Address Register Number 
Subformat Required Address Required Register 
Name Subformat Field Alignment (big-endian)* |Alignment Number 
FD-0 s:exp{10:0}:fraction{51:32} 0 mod 4 n 0 mod 2 f 
FD-1 fraction{31:0} 0 mod 4 n+4 1mod2  f+1° 
FQ-0 s:exp{14:0}:fraction{111:96} 0 mod 4 n 0 mod 4 f 
FQ-1 fraction(95:64] 0 mod 4 n4 1mod4 f+1° 
FO-2 fraction{63:32} 0 mod 4 n +8 2 mod 4 f+2 
FQ-3 fraction(31:0] 0 mod 4 n +12 3mod4 — f«3 

















* Thememory Address in this table applies to big-endian memory accesses. Word and byte order are reversed when little-endian 
accesses are used. 


+ 


Although a floating-point doubleword is required only to be word-aligned in memory, it is recommended that it be double- 


word-aligned (that is, the address of its FD-0 word should be 0 mod 8 so that it can be accessed with doubleword loads/stores 
instead of multiple singleword loads/stores). 


++ 


Although a floating-point quadword is required only to be word-aligned in memory, it is recommended that it be quadword- 


aligned (that is, the address of its FQ-0 word should be 0 mod 16). 


© 


Note that this 32-bit floating-point register is only directly addressable in the lower half of the register file (that is, if its register 


number is < 31). 





4.3 


SIMD Data Formats 


SIMD (single instruction/multiple data) instructions perform identical operations on 
multiple data contained ("packed") in each source operand. This section describes 
the data formats used by SIMD instructions. 


Conversion between the different SIMD data formats can be achieved through SIMD 
multiplication or by the use of the SIMD data formatting instructions. 
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4.3.1 


4.3.2 


Programming 


The SIMD data formats can be used in graphics calculations to 
Note 


represent intensity values for an image (e.g., a, B, G, R). 


Intensity values are typically grouped in one of two ways, when 
using SIMD data formats: 


= Band interleaved images, with the various color components 
of a point in the image stored together, and 


m Band sequential images, with all of the values for one color 
component stored together. 





Uint8 SIMD Data Format 


The Uint8 SIMD data format consists of four unsigned 8-bit integers contained in a 
32-bit word (see FIGURE 4-11). 


Uint8 SIMD values value; values values 





31 24 23 16 15 8 7 0 


FIGURE 4-11 Uint8 SIMD Data Format 


Int16 SIMD Data Formats 


The Int16 SIMD data format consists of four signed 16-bit integers contained in a 64- 
bit word (see FIGURE 4-12). 





Int16 
simp | valueg S4 value; So values S3 values 
63 62 48 47 46 32 31 30 16 15 14 0 
FIGURE 4-12 Int16 SIMD Data Format 
4.3.3 Int32 SIMD Data Format 
The Int32 SIMD data format consists of two signed 32-bit integers contained in a 64- 
bit word (see FIGURE 4-13). 
Int32 
simp | °° value, S4 value; 
63 62 32 31 30 0 





FIGURE 4-13 Int32 SIMD Data Format 
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Programming | The integer SIMD data formats can be used to hold fixed-point 
Note | data. The position of the binary point in a SIMD datum is 
implied by the programmer and does not influence the 
computations performed by instructions that operate on that 
SIMD data format. 
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CHAPTER 5 


Registers 





The following registers are described in this chapter: 

m General-Purpose R Registers on page 49. 

m Floating-Point Registers on page 55. 

m Floating-Point State Register (FSR) on page 61. 

a Ancillary State Registers on page 70. The following registers are included in this 
category: 


32-bit Multiply/Divide Register (Y) (ASR 0) on page 72. 

Integer Condition Codes Register (CCR) (ASR 2) on page 72. 
Address Space Identifier (ASI) Register (ASR 3) on page 74. 

Tick (TICK) Register (ASR 4) on page 74. 

Program Counters (PC, NPC) (ASR 5) on page 76. 

Floating-Point Registers State (FPRS) Register (ASR 6) on page 76. 
Performance Control Register (PCRP) (ASR 16) on page 78. 
Performance Instrumentation Counter (PIC) Register (ASR 17) on page 79. 
General Status Register (GSR) (ASR 19) on page 80. 

SOFTINT? Register (ASRs 20, 21, 22) on page 81. 

SOFTINT SETP Pseudo-Register (ASR 20) on page 82. 

SOFTINT CLRP Pseudo-Register (ASR 21) on page 83. 

Tick Compare (TICK CMPRP) Register (ASR 23) on page 83. 

System Tick (STICK) Register (ASR 24) on page 84. 

System Tick Compare (STICK CMPRP) Register (ASR 25) on page 85. 


m Register-Window PR State Registers on page 86. The following registers are 
included in this subcategory: 


Current Window Pointer (CWPP) Register (PR 9) on page 87. 
Savable Windows (CANSAVE?) Register (PR 10) on page 87. 
Restorable Windows (CANRESTORE?) Register (PR 11) on page 88. 
Clean Windows (CLEANWINP) Register (PR 12) on page 88. 

Other Windows (OTHERWINP) Register (PR 13) on page 88. 
Window State (WSTATE?) Register (PR 14) on page 89. 


m Non-Register-Window PR State Registers on page 91. The following registers are 
included in this subcategory: 


Trap Program Counter (TPCP) Register (PR 0) on page 91. 
Trap Next PC (TNPCP) Register (PR 1) on page 92. 
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Trap State (TSTATE?) Register (PR 2) on page 93. 

Trap Type (TT?) Register (PR 3) on page 94. 

Trap Base Address (TBA!) Register (PR 5) on page 95. 
Processor State (PSTATE®) Register (PR 6) on page 95. 

Trap Level Register (TL?) (PR 7) on page 100. 

Processor Interrupt Level (PILP) Register (PR 8) on page 102. 
Global Level Register (GLP) (PR 16) on page 102. 


m HPR State Registers on page 104. The following registers are included in this 
category. 


Hyperprivileged State (HPSTATE") Register (HPR 0) on page 105. 
Hyperprivileged Trap State (HTSTATE ) Register (HPR 1) on page 106. 
Hyperprivileged Interrupt Pending (HINTP”) Register (HPR 3) on page 107. 
Hyperprivileged Implementation Version (HVER”) Register (HPR 6) on page 
109. 


Hyperprivileged System Tick Compare (HSTICK_CMPR¥) Register (HPR 31) 
on page 110. 


There are additional registers that may be accessed through ASIs; those registers are 
described in Chapter 10, Address Space Identifiers (ASIs). 





5.1 Reserved Register Fields 


For convenience, some registers in this chapter are illustrated as fewer than 64 bits 
wide. Any bits not shown (or explicitly marked as reserved) are reserved for future 
extensions to the architecture. 


Such a reserved field within a register reads as zero in current implementations and, 
when written by software, should only be written with the value of that field 
previously read from that register or with the value zero. 


Programming | Software intended to run on future versions of the UltraSPARC 


Note Architecture should not assume that reserved register fields will 
read as 0 or any other particular value. 
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General-Purpose R Registers 


An UItraSPARC Architecture virtual processor contains an array of general-purpose 
64-bit R registers. The array is partitioned into MAXGL + 1 sets of eight global 
registers, plus N REG WINDOWS groups of 16 registers each. The value of 

N REG WINDOWS in an UltraSPARC Architecture implementation falls within the 
range 3 to 32 (inclusive). 


One set of 8 global registers is always visible. At any given time, a group of 24 
registers, known as a register window, is also visible. A register window comprises 
the 16 registers from the current 16-register group (referred to as 8 in registers and 8 
local registers), plus half of the registers from the next 16-register group (referred to 
as 8 out registers). See FIGURE 5-1. 


SPARC instructions use 5-bit fields to reference R registers. That is, 32 R registers are 
visible to software at any moment. Which 32 out of the full set of R registers are 
visible is described in the following sections. The visible 32 R registers are named 
R[0] through R[31], illustrated in FIGURE 5-1. 


Global R Registers 


Registers R[0] - R[7] refer to a set of eight registers called the global registers (labeled 
g0 through g7). At any time, one of MAXGL +1 sets of eight registers is enabled and 
can be accessed as the current set of global registers. The currently enabled set of 
global registers is selected by the GL register. See Global Level Register (GL?) (PR 16) 
on page 102. 


Global register zero (G0) always reads as zero; writes to it have no software-visible 
effect. 


Windowed R Registers 


A set of 24 R registers that is visible as R[8]-R[31] at any given time is called a 
"register window". The registers that become R[8]-R[15] in a register window are 
called the out registers of the window. Note that the in registers of a register window 
become the out registers of an adjacent register window. See TABLE 5-1 and 

FIGURE 5-2. 
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R[31] 
R[30] 
R[29] 
R[28] 
R[27] 
R[26] 
R[25] 
R[24] 
R[23] 
R[22] 
R[21] 
R[20] 
R[19] 
R[18] 
R[17] 
R[16] 
R[15] 
R[14] 
R[13] 
R[12] 
R[11] 
R[10] 
R[9] 
R[8] 
R[7] 
R[6] 
RIS] 
RIA] 
RES] 
R[2] 
R[1] 
R[0] 


ins 


locals 





outs 





globals 





FIGURE 5-1 General-Purpose Registers (as Visible at Any Given Time) 


The names in, local, and out originate from the fact that the out registers are typically 
used to pass parameters from (out of) a calling routine and that the called routine 
receives those parameters as its in registers. 


TABLE 5-1 Window Addressing 


Windowed Register Address R Register Address 
in[0] — in[7] R[24] - R[31] 
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TABLE 5-1 Window Addressing 





Windowed Register Address R Register Address 
local[0] — local[7] R[16] — R[23] 
out[0] — out[7] R[ 8] - R[15] 
global[0] — global[7] R[ 0] - R[ 7] 


V9 Compatibility | In the SPARC V9 architecture, the number of 16-register 

Note | windowed register sets, N REG WINDOWS, ranges from 3 to 32 
(impl. dep. #2-V8). The maximum global register set index in the 
UItraSPARC Architecture, MAXGL, ranges from 2 to 15. The 
number of implemented global register sets is MAXGL + 1. The 
total number of R registers in a given UltraSPARC Architecture 
implementation is: 

(N REG WINDOWS x 16) + (( MAXGL + 1) x 8) 

Therefore, an UltraSPARC Architecture processor may contain 
from 72 to 640 R registers. 
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The current window in the windowed portion of R registers is indicated by the 
current window pointer (CWP) register. The CWP is decremented by the RESTORE 
instruction and incremented by the SAVE instruction. 


Window (CWP — 1) 























R[31] 
: ins 
R[24] 
R[23] 
: locals 
R[16] Window (CWP) 
R[15] R[31] 
: outs : ins 
R[ 8] R[24] 
R[23] 
: locals 
R[16] Window (CWP + 1) 
R[15] R[31 
: outs ins 
R[ 8] R[24 
R[23 
locals 
R[16 
R[15 
outs 
R[ 8 














63 0 
FIGURE 5-2 Three Overlapping Windows and Eight Global Registers 


Overlapping Windows. Each window shares its ins with one adjacent window 
and its outs with another. The outs of the CWP - 1 (modulo N REG WINDOWS) 
window are addressable as the ins of the current window, and the outs in the current 
window are the ins of the CWP + 1 (modulo N REG WINDOWS) window. The locals 
are unique to each window. 


Register address o, where 8 x o < 15, refers to exactly the same out register before the 
register window is advanced by a SAVE instruction (CWP is incremented by 1 

(modulo N REG WINDOWS)) as does register address 0-16 after the register window 
is advanced. Likewise, register address i, where 24 < i € 31, refers to exactly the same 
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in register before the register window is restored by a RESTORE instruction (CWP is 
decremented by 1 (modulo N_REG_WINDOWS)) as does register address i—16 after the 
window is restored. See FIGURE 5-2 on page 52 and FIGURE 5-3 on page 54. 


To application software, the virtual processor appears to provide an infinitely-deep 
stack of register windows. 


Programming | Since the procedure call instructions (CALL and JMPL) do not 
Note | change the CWP, a procedure can be called without changing 
the window. See the section "Leaf-Procedure Optimization" in 
Software Considerations, contained in the separate volume 
UltraSPARC Architecture Application Notes 


Since CWP arithmetic is performed modulo N REG WINDOWS, the highest-numbered 
implemented window overlaps with window 0. The outs of window 

N REG WINDOWS — 1 are the ins of window 0. Implemented windows are numbered 
contiguously from 0 through N REG WINDOWS —1. 


Because the windows overlap, the number of windows available to software is 1 less 
than the number of implemented windows; that is, N REG WINDOWS — 1. When the 
register file is full, the outs of the newest window are the ins of the oldest window, 
which still contains valid data. 


Window overflow is detected by the CANSAVE register, and window underflow is 
detected by the CANRESTORE register, both of which are controlled by privileged 
software. A window overflow (underflow) condition causes a window spill (fill) 
trap. 


When a new register window is made visible through use of a SAVE instruction, the 
local and out registers are guaranteed to contain either zeroes or valid data from the 
current context. If software executes a RESTORE and later executes a SAVE, then the 
contents of the resulting window's local and out registers are not guaranteed to be 
preserved between the RESTORE and the SAVE!. Those registers may even have 
been written with "dirty" data, that is, data created by software running in a 
different context. However, if the clean window protocol is being used, system 
software must guarantee that registers in the current window after a SAVE always 
contains only zeroes or valid data from that context. See Clean Windows 
(CLEANWINP) Register (PR 12) on page 88, Savable Windows (CANSAVEP) Register 
(PR 10) on page 87, and Restorable Windows (CANRESTORE?) Register (PR 11) on 
page 88. 


Implementation | An UltraSPARC Architecture virtual processor supports the 
Note | guarantee in the preceding paragraph of "either zeroes or valid 
data from the current context"; it may do so either in hardware 
or in a combination of hardware and system software. 


1- For example, any of those 16 registers might be altered due to the occurrence of a trap between the RESTORE 
and the SAVE, or might be altered during the RESTORE operation due to the way that register windows are 
implemented. After a RESTORE instruction executes, software must assume that the values of the affected 16 
registers from before the RESTORE are unrecoverable. 
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Register Window Management Instructions on page 131 describes how the windowed 
integer registers are managed. 


CWP = 0 
(CURRENT WINDOW POINTER) 


\ 


wO locals 


XS w5 locals 
CANRESTORE = 1 


CANSAVE + CANRESTORE + OTHERWIN = N REG WINDOWS — 2 





CANSAVE -4 





(Overlap) 


The current window (window 0) and the overlap window (window 5) account for the 
two windows in the right side of the equation. The "overlap window" is the window 
that must remain unused because its ins and outs overlap two other valid windows. 


FIGURE 5-3 Windowed R Registers for N REG WINDOWS = 8 


54 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


94.9 


In FIGURE 5-3, N REG. WINDOWS = 8. The eight global registers are not illustrated. 
CWP = 0, CANSAVE = 4, OTHERWIN = 1, and CANRESTORE = 1. If the procedure 
using window w0 executes a RESTORE, then window w7 becomes the current 
window. If the procedure using window w0 executes a SAVE, then window w1 
becomes the current window. 


Special R Registers 


The use of two of the R registers is fixed, in whole or in part, by the architecture: 
m The value of R[0] is always zero; writes to it have no program-visible effect. 


m The CALL instruction writes its own address into register R[15] (out register 7). 


Register-Pair Operands. LDTW, LDTWA, STTW, and STTWA instructions access 
a pair of words ("twin words") in adjacent R registers and require even-odd register 
alignment. The least significant bit of an R register number in these instructions is 
unused and must always be supplied as 0 by software. 


When the R[0]-R[1] register pair is used as a destination in LDTW or LDTWA, only 
R[1] is modified. When the R[0]-R[1] register pair is used as a source in STTW or 
STTWA, 0 is read from R[0], so 0 is written to the 32-bit word at the lowest address, 
and the least significant 32 bits of R[1] are written to the 32-bit word at the highest 
address. 


An attempt to execute anLDTW, LDTWA, STTW, or STTWA instruction that refers 
to a misaligned (odd) destination register number causes an illegal instruction trap. 





2.9 


Floating-Point Registers 


The floating-point register set consists of sixty-four 32-bit registers, which may be 
accessed as follows: 


m Sixteen 128-bit quad-precision registers, referenced as Fo[0], Fal4l, ..., Fal60] 
m Thirty-two 64-bit double-precision registers, referenced as Fp[0], Fp[2], ..., Fp[62] 


m Thirty-two 32-bit single-precision registers, referenced as Fs[0], Fs[1], ..., Fs[31] 
(only the lower half of the floating-point register file can be accessed as single- 
precision registers) 


The floating-point registers are arranged so that some of them overlap, that is, are 
aliased. The layout and numbering of the floating-point registers are shown in 
TABLE 5-2. Unlike the windowed R registers, all of the floating-point registers are 
accessible at any time. The floating-point registers can be read and written by 
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floating-point operate (FPop1/FPop2 format) instructions, by load/store single/ 
double/quad floating-point instructions, by VIS™ instructions, and by block load 
and block store instructions. 


TABLE 5-2 Floating-Point Registers, with Aliasing (1 of 3) 










































Single Precision Double Precision Quad Precision 
(32-bit) (64-bit) (128-bit) 
Assembly Assembly Assembly 

Register Language | Bits Register Language | Bits Register Language 
Fg[0] %£0 

Fp[0] 
Fsli] £1 

Fal0] 

Fsl2] %£2 

Fp[2] 
Fg[3]  $£3 
Fg[4]  %f4 

Fpl4] 
Fs[5] £5 
Fg[6]  $£6 
Fo[7] £7 
Fsl8ll %£8 
Fg[9] £9 
Fs[10] %f10 
Fo[l1] %f11 
Fg[12] %f12 
Fg[13] %f13 
Fo[14] %f14 
Fs[15] %f15 
Fs[16] %f16 
Fs[17] $£17 
Fg[18] %£18 
Fg[19] %f19 
Fs[20] %£20 
Fs[21] %£21 
Fg[22] %£22 

Fp[22] %a22  |63:0 
Fg[233] %f23  |31:0 
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TABLE 5-2 Floating-Point Registers, with Aliasing (2 of 3) 


Single Precision Double Precision Quad Precision 
(32-bit) (64-bit) (128-bit) 


Assembly Assembly Assembly 
Register Language |Bits Register Language |Bits Register Language 





$f24 
Fp[24] 


FQ[24] 
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TABLE 5-2 Floating-Point Registers, with Aliasing (3 of 3) 


Single Precision Double Precision Quad Precision 
(32-bit) (64-bit) (128-bit) 


Assembly Assembly Assembly 
Register Language |Bits Register Language |Bits Register Language 















Fp[52] 
Fal52] 











5.3.1 Floating-Point Register Number Encoding 


Register numbers for single, double, and quad registers are encoded differently in 
the 5-bit register number field of a floating-point instruction. If the bits in a register 
number field are labeled b{4} ... b{0} (where b{4} is the most significant bit of the 
register number), the encoding of floating-point register numbers into 5-bit 
instruction fields is as given in TABLE 5-3. 


TABLE 5-3 Floating-Point Register Number Encoding 


Register Operand Encoding in a 5-bit Register Field in an 
Full 6-bit Register Number Instruction 


b{2} b{1} b{0} b{4} b{3} b{2} b{1} b{0} 
b{2} b{1} 0 b{4} b{3} b{2} b{1} b{5} 
b{3} b{2} 0 0 b{4} b{3} b{2} 0 b{5} 
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SPARC V8 | In the SPARC V8 architecture, bit 0 of double and quad register 
Compatibility | numbers encoded in instruction fields was required to be zero. 
Note | Therefore, all SPARC V8 floating-point instructions can run 
unchanged on an UltraSPARC Architecture virtual processor, 
using the encoding in TABLE 5-3. 


9:92 Double and Quad Floating-Point Operands 


A single 32-bit F register can hold one single-precision operand; a double-precision 
operand requires an aligned pair of F registers, and a quad-precision operand 
requires an aligned quadruple of F registers. At a given time, the floating-point 
registers can hold a maximum of 32 single-precision, 16 double-precision, or 8 quad- 
precision values in the lower half of the floating-point register file, plus an 
additional 16 double-precision or 8 quad-precision values in the upper half, or 
mixtures of the three sizes. 
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Programming | The upper 16 double-precision (upper 8 quad-precision) 

Note | floating-point registers cannot be directly loaded by 32-bit load 
instructions. Therefore, double- or quad-precision data that is 
only word-aligned in memory cannot be directly loaded into the 
upper registers with LDF[A] instructions. The following 
guidelines are recommended: 


1. Whenever possible, align floating-point data in memory on 
proper address boundaries. If access to a datum is required to 
be atomic, the datum must be properly aligned. 

2. If a double- or quad-precision datum is not properly aligned 
in memory or is still aligned on a 4-byte boundary, and access 
to the datum in memory is not required to be atomic, then 
software should attempt to allocate a register for it in the 
lower half of the floating-point register file so that the datum 
can be loaded with multiple LDF[A] instructions. 

3. If the only available registers for such a datum are located in 
the upper half of the floating-point register file and access to 
the datum in memory is not required to be atomic, the word- 
aligned datum can be loaded into them by one of two 
methods: 

» Load the datum into an upper register by using multiple 
LDF[A] instructions to first load it into a double- or quad- 
precision register in the lower half of the floating-point 
register file, then copy that register to the desired 
destination register in the upper half 

a Use an LDDF[A] or LDOF[A] instruction to perform the 
load directly into the upper floating-point register, 
understanding that use of these instructions on poorly 
aligned data can cause a trap (LDDF. mem not aligned) on 
some implementations, possibly slowing down program 
execution significantly. 


Programming | If an UltraSPARC Architecture 2005 implementation does not 

Note | implement a particular quad floating-point arithmetic operation 
in hardware and an invalid quad register operand is specified, 
per FSR.ftt priorities in TABLE 5-7, the fp exception other 
exception occurs with FSR.ftt = 3 (unimplemented FPop) 
instead of with FSR.ftt = 6 (invalid fp. register). 


Implementation | UltraSPARC Architecture 2005 implementations do not 

Note | implement any quad floating-point arithmetic operations in 
hardware. Therefore, an attempt to execute any of them results 
in a trap on the fp exception other exception with FSR.ftt = 3 
(unimplemented, FPop). 
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5.4 


Floating-Point State Register (FSR) 


The Floating-Point State register (FSR) fields, illustrated in FIGURE 5-4, contain FPU 
mode and status information. The lower 32 bits of the FSR are read and written by 
the (deprecated) STFSR and LDFSR instructions, respectively. The 64-bit FSR 
register is read by the STXESR instruction and written by the LDXFSR instruction. 
The ver, ftt, qne, unimplemented (for example, ns), and reserved (“—”) fields of 
FSR are not modified by either LDFSR or LDXFSR. 


RW RW RW 


EL LL 


63 
FSR 


38 37 36 35 34 33 32 


ICI BEIICORED 


31 30 29 28 27 23 22 21 20 19 17 16 14 13 12 11 10 9 


FIGURE 5-4 FSR Fields 


Bits 63-38, 29-28, 21-20, and 12 of FSR are reserved. When read by an STXFSR 
instruction, these bits always read as zero 


Programming 
Note 


For future compatibility, software should issue LDXFSR 
instructions only with zero values in these bits or values of these 
bits exactly as read by a previous STXFSR. 





The subsections on pages 61 through 70 describe the remaining fields in the FSR. 


Floating-Point Condition Codes (fccO, fcc1, fcc2, 
fcc3) 


The four sets of floating-point condition code fields are labeled fcc0, fcc1, fcc2, and 
fcc3 (fccn refers to any of the floating-point condition code fields). 


The fccO field consists of bits 11 and 10 of the FSR, fcc1 consists of bits 33 and 32, 
fcc2 consists of bits 35 and 34, and fcc3 consists of bits 37 and 36. Execution of a 
floating-point compare instruction (FCMP or FCMPE) updates one of the fcon fields 
in the FSR, as selected by the compare instruction. The fccr fields are read by 
STXFSR and written by LDXFSR. The fcc0 field can also be read and written by 
STFSR and LDFSR, respectively. FBfcc and FBPfcc instructions base their control 
transfers on the content of these fields. The MOVcc and FMOVcc instructions can 
conditionally copy a register, based on the contents of these fields. 
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9.4.2 


5.4.3 


In TABLE 5-5, f;;; and f,,2 correspond to the single, double, or quad values in the 
floating-point registers specified by a floating-point compare instruction's rs1 and 
rs2 fields. The question mark (?) indicates an unordered relation, which is true if 
either f;5; or frs2 is a signalling NaN or a quiet NaN. If FCMP or FCMPE generates 
an fp exception ieee 754 exception, then fccn is unchanged. 


TABLE 5-4 Floating-Point Condition Codes (fccn) Fields of FSR 





Content of fccn Indicated Relation 

0 F[rs1] = F[rs2] 

1 F[rs1] « F[rs2] 

2 F[rs1] » F[rs2] 

3 F[rs1] ? F[rs2] (unordered) 





TABLE 5-5 Floating-Point Condition Codes (fccn) Fields of FSR 


Content of fecn 





0 1 2 3 
Indicated Relation  F[rs1] = F[rs2] F[rs1] < F[rs2] F[rst] > F[rs?] Fl[rs1] ? F[rs2] 
(FCMP*, FCMPE*) (unordered) 


Rounding Direction (rd) 


Bits 31 and 30 select the rounding direction for floating-point results according to 
IEEE Std 754-1985. TABLE 5-6 shows the encodings. 


TABLE 5-6 Rounding Direction (rd) Field of FSR 





rd Round Toward 

0 Nearest (even, if tie) 
1 0 

2 + 00 

3 — oo 


If the interval mode bit of the General Status register has a value of 1 (GSR.im - 1), 
then the value of FSR.rd is ignored and floating-point results are instead rounded 
according to GSR.irnd. See General Status Register (GSR) (ASR 19) on page 80 for 
further details. 


Trap Enable Mask (tem) 
Bits 27 through 23 are enable bits for each of the five IEEE-754 floating-point 


exceptions that can be indicated in the current. exception field (cexc). See FIGURE 5-6 
on page 69. If a floating-point instruction generates one or more exceptions and the 
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5.4.4 


5.4.5 


5.4.6 


tem bit corresponding to any of the exceptions is 1, then this condition causes an 
fp exception ieee 754 trap. A tem bit value of 0 prevents the corresponding IEEE 
754 exception type from generating a trap. 


Nonstandard Floating-Point (ns) 


On an UltraSPARC Architecture 2005 processor, FSR.ns is a reserved bit; it always 
reads as 0 and writes to it are ignored. (impl. dep. #18-V8) 


FPU Version (ver) 


IMPL. DEP. #19-V8: Bits 19 through 17 identify one or more particular 
implementations of the FPU architecture. 


For each SPARC V9 IU implementation (as identified by its HVER.impl field), there 
may be one or more FPU implementations, or none. FSR.ver identifies the particular 
FPU implementation present. The value in FSR.ver for each implementation is 
strictly implementation dependent. Consult the appropriate document for each 
implementation for its setting of FSR.ver. 


FSR.ver = 7 is reserved to indicate that no hardware floating-point controller is 
present. 


The ver field of FSR is read-only; it cannot be modified by the LDFSR or LDXFSR 
instructions. 


Floating-Point Trap Type (ftt) 


Several conditions can cause a floating-point exception trap. When a floating-point 
exception trap occurs, FSR.ftt (FSR{16:14}) identifies the cause of the exception, the 
“floating-point trap type." After a floating-point exception occurs, FSR.ftt encodes 
the type of the floating-point exception until it is cleared (set to 0) by execution of an 
STFSR, STXFSR, or FPop that does not cause a trap due to a floating-point exception. 


The FSR.ftt field can be read by a STFSR or STXFSR instruction. The LDFSR and 
LDXFSR instructions do not affect FSR.ftt. 
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Privileged software that handles floating-point traps must execute an STESR (or 
STXFSR) to determine the floating-point trap type. STFSR and STXFSR set FSR.ftt to 
zero after the store completes without error. If the store generates an error and does 
not complete, FSR.ftt remains unchanged. 


Programming | Neither LDFSR nor LDXFSR can be used for the purpose of 

Note | clearing the ftt field, since both leave ftt unchanged. However, 
executing a nontrapping floating-point operate (FPop) 
instruction such as “fmovs %f0,%f0” prior to returning to 
nonprivileged mode will zero FSR.ftt. The ftt field remains zero 
until the next FPop instruction completes execution. 





FSR.ftt encodes the primary condition ("floating-point trap type") that caused the 
generation of an fp exception other or fo exception ieee 754 exception. It is 
possible for more than one such condition to occur simultaneously; in such a case, 
only the highest-priority condition will be encoded in FSR.ftt. The conditions 
leading to fp. exception other and fp exception ieee 754 exceptions, their relative 
priorities, and the corresponding FSR.ftt values are listed in TABLE 5-7. Note that the 
FSR.ftt values 4 and 5 were defined in the SPARC V9 architecture but are not 
currently in use, and that the value 7 is reserved for future architectural use. 


TABLE 5-7 FSR Floating-Point Trap Type (ftt) Field 


Relative result 
Condition Detected During Priority FSR.ftt Set 
Execution of an FPop (1 = highest) to Value Exception Generated 
unimplemented_FPop 10 3 fp exception other 
invalid fp. register 20 6 fp exception other 
unfinished FPop 30 2 fp exception other 
IEEE 754 exception 40 1 fp exception ieee 754 
Reserved — 4,5,7 — 
(none detected) — 0 — 





The IEEE_754_exception, unimplemented_FPop, and unfinished_FPop conditions 
will likely arise occasionally in the normal course of computation and must be 
recoverable by system software. 


When a floating-point trap occurs, the following results are observed by user 
software: 


1. The value of aexc is unchanged. 


2. When an fp_exception_ieee_754 trap occurs, a bit corresponding to the trapping 
exception is set in cexc. On other traps, the value of cexc is unchanged. 


3. The source and destination registers are unchanged. 


4. The value of fccn is unchanged. 
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The foregoing describes the result seen by a user trap handler if an IEEE exception is 
signalled, either immediately from an fo. exception ieee 754 exception or after 
recovery from an unfinished. FPop or unimplemented_FPop. In either case, cexc as 
seen by the trap handler reflects the exception causing the trap. 


In the cases of an fp. exception other exception with a floating-point trap type of 
unfinished. FPop or unimplemented_FPop that does not subsequently generate an 
IEEE trap, the recovery software should set cexc, aexc, and the destination register 
or fccn, as appropriate. 


ftt = 1 (IEEE 754 exception). The IEEE 754 exception floating-point trap type 
indicates the occurrence of a floating-point exception conforming to IEEE Std 754- 
1985. The IEEE 754 exception type (overflow, inexact, etc.) is set in the cexc field. The 
aexc and fccn fields and the destination F register are unchanged. 


ftt = 2 (unfinished FPop). The unfinished FPop floating-point trap type indicates 
that the virtual processor was unable to generate correct results or that exceptions as 
defined by IEEE Std 754-1985 have occurred. In cases where exceptions have 
occurred, the cexc field is unchanged. 


IMPL. DEP. #248-U3: The conditions under which an fp exception other exception 
with floating-point trap type of unfinished. FPop can occur are implementation 
dependent. An implementation may cause fp exception other with 

FSR.ftt = unfinished. FPop under a different (but specified) set of conditions. 


ftt = 3 (unimplemented FPop). The unimplemented FPop floating-point trap 
type indicates that the virtual processor decoded an FPop that it does not implement 
in hardware. In this case, the cexc field is unchanged. 


For example, all quad-precision FPop variations in an Ultra$PARC Architecture 2005 
virtual processor cause an fp exception other exception, setting 
FSR.ftt = unimplemented FPop. 


Forward | The next revision of the UltraSPARC Architecture is expected to 
Compatibility | eliminate "unimplemented FPop", to simplify handling of 
Note | unimplemented instructions. At that point, all conditions which 
currently cause cause fp. exception other with FSR.ftt = 3 will 
cause an illegal instruction exception, instead. FSR.ftt = 3 and 
the trap type associated with fp exception other will become 
reserved for other possible future uses. 
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ftt = 4 (Reserved). 


SPARC V9 | In the SPARC V9 architecture, FSR.ftt = 4 was defined to be 
Compatibility | "sequence error", for use with certain error conditions 
Note | associated with a floating-point queue (FQ). Since UltraSPARC 
Architecture implementations generate precise (rather than 
deferred) traps for floating-point operations, an FQ is not 
needed; therefore sequence_error conditions cannot occur and 
ftt =4 has been returned to the pool of reserved ftt values. 





ftt = 5 (Reserved). 


SPARC V9 | In the SPARC V9 architecture, FSR.ftt = 5 was defined to be 

Compatibility | "hardware_error", for use with hardware error conditions 
Note | associated with an external floating-point unit (FPU) operating 

asynchronously to the main processor (IU). Since UltraSPARC 
Architecture processors are now implemented with an integral 
FPU, a hardware error in the FPU can generate an exception 
directly, rather than indirectly report the error through FSR. fit 
(as was required when FPUs were external to IUs). Therefore, 
ftt = 5 has been returned to the pool of reserved ftt values. 





ftt = 6 (invalid_fp_register). This trap type indicates that one or more F register 
operands of an FPop are misaligned; that is, a quad-precision register number is not 
0 mod 4. An implementation generates an fp exception other trap with FSR.ftt = 
invalid. fp. register in this case. 


Implementation | Per FSR.ftt priorities in TABLE 5-7, if an UltraSPARC Architecture 
Note | 2005 processor does not implement a particular quad FPop in 
hardware, that FPop generates an fp. exception other exception 
with FSR.ftt = 3 (unimplemented FPop) instead of 
fp exception other with FSR.ftt = 6 (invalid fp. register), 
regardless of the specified F registers. 


5.4.7 FQ Not Empty (qne) 


Since UltraSPARC Architecture 2005 virtual processors do not implement a floating- 
point queue, FSR.qne always reads as zero and writes to FSR.qne are ignored. 


5.4.8 Accrued Exceptions (aexc) 


Bits 9 through 5 accumulate IEEE 754 floating-point exceptions as long as floating- 
point exception traps are disabled through the tem field. See FIGURE 5-7 on page 69. 
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After an FPop completes with ftt = 0, the tem and cexc fields are logically anded 
together. If the result is nonzero, aexc is left unchanged and an 

fp exception ieee 754 trap is generated; otherwise, the new cexc field is ored into 
the aexc field and no trap is generated. Thus, while (and only while) traps are 
masked, exceptions are accumulated in the aexc field. 


FSR.aexc can be set to a specific value when an LDFSR or LDXFSR instruction is 
executed. 


Current Exception (cexc) 


FSR.cexc (FSR{4:0}) indicates whether one or more IEEE 754 floating-point 
exceptions were generated by the most recently executed FPop instruction. The 
absence of an exception causes the corresponding bit to be cleared (set to 0). See 
FIGURE 5-6 on page 69. 


Programming | If the FPop traps and software emulate or finish the instruction, 
Note | the system software in the trap handler is responsible for 
creating a correct FSR.cexc value before returning to a 
nonprivileged program. 


The cexc bits are set as described in Floating-Point Exception Fields on page 68, by the 
execution of an FPop that either does not cause a trap or causes an 

fp exception ieee 754 exception with FSR.ftt = IEEE 754 exception. An IEEE 754 
exception that traps shall cause exactly one bit in FSR.cexc to be set, corresponding 
to the detected IEEE Std 754-1985 exception. 


Floating-point operations which cause an overflow or underflow condition may also 
cause an “inexact” condition. For overflow and underflow conditions, FSR.cexc bits 
are set and trapping occurs as follows: 


m If an IEEE 754 overflow condition occurs: 


a if FSR.tem.ofm = 0 and tem.nxm = 0, the FSR.cexc.ofc and FSR.cexc.nxc bits 
are both set to 1, the other three bits of FSR.cexc are set to 0, and an 
fp exception ieee 754 trap does not occur. 


a if FSR.tem.ofm = 0 and tem.nxm = 1, the FSR.cexc.nxc bit is set to 1, the other 
four bits of FSR.cexc are set to 0, and an fp exception ieee 754 trap does 
occur. 


a if FSR.tem.ofm = 1, the FSR.cexc.ofc bit is set to 1, the other four bits of 
FSR.cexc are set to 0, and an fp_exception_ieee_754 trap does occur. 


m If an IEEE 754 underflow condition occurs: 


a if FSR.tem.ufm = 0 and FSR.tem.nxm = 0, the FSR.cexc.ufc and FSR.cexc.nxc 
bits are both set to 1, the other three bits of FSR.cexc are set to 0, and an 
fp exception ieee 754 trap does not occur. 
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a if FSR.tem.ufm = 0 and FSR.tem.nxm = 1, the FSR.cexc.nxc bit is set to 1, the 
other four bits of FSR.cexc are set to 0, and an fp exception ieee 754 trap 
does occur. 


a if FSR.tem.ufm = 1, the FSR.cexc.ufc bit is set to 1, the other four bits of 
FSR.cexc are set to 0, and an fp exception ieee 754 trap does occur. 


The above behavior is summarized in TABLE 5-8 (where "v" indicates "exception was 
detected” and "x" indicates "don't care"): 


TABLE5-8 Setting of FSR.cexc Bits 








Conditions Results 
Exception(s) Current 
Detected Trap Enable Exception 
in F.p. Mask bits > bits (in 
operation (in FSR.tem) fp_exception_ FSR.cexc) 
ieee_754 
of uf nx ofm ufm nxm | Trap Occurs? ofc ufc nxc 
- - - x x x no 0 0 0 
- - v x x 0 no 0 0 1 
- v! v! x 0 0 no 0 1 1 
v? - 0 x 0 no 1 0 1 
- - v x x 1 yes 0 0 1 
^ v v x 0 1 yes 0 0 1 
- v - x 1 x yes 0 1 0 
- v v x 1 x yes 0 1 0 
v? - v? 1 x x yes 1 0 0 
o - .X«4 0 x 1 yes 0 0 1 





Notes: ! When the underflow trap is disabled (FSR.tem.ufm - 0) 
underflow is always accompanied by inexact. 
? Overflow is always accompanied by inexact. 


If the execution of an FPop causes a trap other than fp exception ieee 754, 
FSR.cexc is left unchanged. 


5.4.10 Floating-Point Exception Fields 


The current and accrued exception fields and the trap enable mask assume the 
following definitions of the floating-point exception conditions (per IEEE Std 754- 
1985): 
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RW RW RW RW RW 
FSR tem 
27 26 25 24 29 


FIGURE 5-6 Trap Enable Mask (tem) Fields of FSR 


RW RW RW RW RW 
FSR.aexc 
9 8 7 6 5 


FIGURE 5-7 Accrued Exception Bits (aexc) Fields of FSR 


RW RW RW RW RW 
FSR.cexc 
4 3 2 1 0 
FIGURE 5-8 Current Exception Bits (aexc) Fields of FSR 


Invalid (nvc, nva). An operand is improper for the operation to be performed. 
For example, 0.0 + 0.0 and co — œ are invalid; 1 = invalid operand(s), 0 = valid 
operand(s). 


Overflow (ofc, ofa). The result, rounded as if the exponent range were 
unbounded, would be larger in magnitude than the destination format's largest 
finite number; 1 = overflow, 0 = no overflow. 


Underflow (ufc, ufa). The rounded result is inexact and would be smaller in 
magnitude than the smallest normalized number in the indicated format; 
1 = underflow, 0 = no underflow. 


Underflow is never indicated when the correct unrounded result is 0. 
Otherwise, when the correct unrounded result is not 0: 


If FSR.tem.ufm = 0: Underflow occurs if a nonzero result is tiny and a loss of 
accuracy occurs. 


If FSR.tem.ufm = 1: Underflow occurs if a nonzero result is tiny. 


The SPARC V9 architecture allows tininess to be detected either before or after 
rounding. However, in all cases and regardless of the setting of FSR.tem.ufm, an 
UltraSPARC Architecture strand detects tininess before rounding (impl. dep. #55-V8- 
Cs10). See Trapped Underflow Definition (ufm = 1) on page 389 and Untrapped 
Underflow Definition (ufm = 0) on page 389 for additional details. 


Division by zero (dzc, dza). An infinite result is produced exactly from finite 
operands. For example, X + 0.0, where X is subnormal or normalized; 1 = division 
by zero, 0 = no division by zero. 
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Inexact (nxc, nxa). The rounded result of an operation differs from the infinitely 
precise unrounded result; 1 = inexact result, 0 = exact result. 


FSR Conformance 


An UltraSPARC Architecture implementation implements the tem, cexc, and aexc 
fields of FSR in hardware, conforming to IEEE Std 754-1985 (impl. dep. #22-V8). 


Programming | Privileged software (or a combination of privileged and 

Note | nonprivileged software) must be capable of simulating the 
operation of the FPU in order to handle the fp exception other 
(with FSR.ftt = unfinished. FPop or unimplemented FPop) and 
IEEE 754 exception floating-point trap types properly. Thus, a 
user application program always sees an FSR that is fully 
compliant with IEEE Std 754-1985. 








5.5 


Ancillary State Registers 


The SPARC V9 architecture defines several optional ancillary state registers (ASRs) 
and allows for additional ones. Access to a particular ASR may be privileged or 
nonprivileged. 


An ASR is read and written with the Read State Register and Write State Register 
instructions, respectively. These instructions are privileged if the accessed register is 
privileged. 


The SPARC V9 architecture left ASRs numbered 16-31 available for implementation- 
dependent uses. UltraSPARC Architecture virtual processors implement the ASRs 
summarized in TABLE 5-9 and defined in the following subsections. 


Each virtual processor contains its own set of ASRs; ASRs are not shared among 
virtual processors. 





TABLE 5-9 ASR Register Summary 
ASR Read by Written by 
number ASR name Register Instruction(s) Instruction(s) 
0 YP Y register (deprecated) RDYP WRYP 
1 — Reserved — — 
2. CCR Condition Codes register RDCCR WRCCR 
3 ASI ASI register RDASI WRASI 
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TABLE 5-9 ASR Register Summary (Continued) 
ASR Read by Written by 
number ASR name Register Instruction(s) Instruction(s) 

4  TICKPw« TICK register RDTICK?**t, WRPR? (TICK) 
RDPRF (TICK) 

5 PC Program Counter (PC) RDPC (all instructions) 

6  FPRS Floating-Point Registers Status register RDFPRS WRFPRS 

7-14 — Reserved — — 
15 — Reserved — — 
16-31 non-SPARC V9 ASRs = — 
16 PCRP Performance Control registers (PCR) RDPCR? WRPCRP 
17 PIC? Performance Instrumentation Counters RDPICPNC WRPICPric 
(PIC) 
18 — Implementation dependent (impl. dep. — — 
#8-V8-Cs20, 9-V8-Cs20) 

19 GSR General Status register (GSR) RDGSR, WRGSR, 
FALIGNDATA, BMASK, SIAM 
many VIS and 
floating-point 
instructions 

20  SOFTINT SETP (pseudo-register, for "Write 1s Set" to — WRSOFTINT SETP 

SOFTINT register, ASR 22) 
21  SOFTINT CLR? (pseudo-register, for "Write 1s Clear' to — WRSOFTINT CLRP 
SOFTINT register, ASR 22) 

22  SOFTINT? per-virtual processor Soft Interrupt RDSOFTINT? WRSOFTINT? 

register 

23 TICK CMPRP Tick Compare register RDTICK_CMPR?  WRTICK_CMPR? 

24 STICK?» System Tick register RDSTICK?rrt WRSTICK# 

25 STICK CMPRP System Tick Compare register RDSTICK CMPR'  WRSTICK CMPR" 

26-31 — Implementation dependent (impl. dep. — — 


#8-V8-Cs20, 9-V8-Cs20) 
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32-bit Multiply /Divide Register (Y) (ASR 0) & 


The Y register is deprecated; it is provided only for compatibility with previous 
versions of the architecture. It should not be used in new SPARC V9 software. 
It is recommended that all instructions that reference the Y register (that is, 
SMUL, SMULcc, UMUL, UMULcc, MULScc, SDIV, SDIVcc, UDIV, UDIVcc, 


RDY, and WRY) be avoided. For suitable substitute instructions, see the 
following pages: for the multiply instructions, see pages 329 and page 374; for 
the multiply step instruction, see page 285; for division instructions, see pages 
321 and 372; for the read instruction, see page 304; and for the write 
instruction, see page 377. 





The low-order 32 bits of the Y register, illustrated in FIGURE 5-9, contain the more 
significant word of the 64-bit product of an integer multiplication, as a result of 
either a 32-bit integer multiply (SMUL, SMULcc, UMUL, UMULcc) instruction or an 
integer multiply step (MULScc) instruction. The Y register also holds the more 
significant word of the 64-bit dividend for a 32-bit integer divide (SDIV, SDIVcc, 
UDIV, UDIVcc) instruction. 


R RW 


32 31 0 
FIGURE 5-9 Y Register 


Although Y is a 64-bit register, its high-order 32 bits always read as 0. 


The Y register may be explicitly read and written by the RDY and WRY instructions, 
respectively. 


Integer Condition Codes Register (CCR) 
(ASR 2) 


The Condition Codes Register (CCR), shown in FIGURE 5-10, contains the integer 
condition codes. The CCR register may be explicitly read and written by the RDCCR 
and WRCCR instructions, respectively. 


RW RW 


7 4 3 0 


FIGURE 5-10 Condition Codes Register 
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5.5.2.1 Condition Codes (CCR.xcc and CCR.icc) 


All instructions that set integer condition codes set both the xcc and icc fields. The 
xcc condition codes indicate the result of an operation when viewed as a 64-bit 
operation. The icc condition codes indicate the result of an operation when viewed 
as a 32-bit operation. For example, if an operation results in the 64-bit value 

0000 0000 FFFF FFFF;4, the 32-bit result is negative (icc.n is set to 1) but the 64-bit 
result is nonnegative (Xcc.n is set to 0). 


Each of the 4-bit condition-code fields is composed of four 1-bit subfields, as shown 
in FIGURE 5-11. 


RW RW RW RW 


XCC: 7 6 5 4 
icc: 3 2 1 0 


FIGURE 5-11 Integer Condition Codes (CCR.icc and CCR.xcc) 


The n bits indicate whether the two's-complement ALU result was negative for the 
last instruction that modified the integer condition codes; 1 = negative, 0 = not 
negative. 


The z bits indicate whether the ALU result was zero for the last instruction that 
modified the integer condition codes; 1 = zero, 0 = nonzero. 


The v bits signify whether the ALU result was within the range of (was 
representable in) 64-bit (xcc) or 32-bit (icc) two's complement notation for the last 
instruction that modified the integer condition codes; 1 = overflow, 0 = no overflow. 


The c bits indicate whether a 2's complement carry (or borrow) occurred during the 
last instruction that modified the integer condition codes. Carry is set on addition if 
there is a carry out of bit 63 (xcc) or bit 31 (icc). Carry is set on subtraction if there is 
a borrow into bit 63 (xcc) or bit 31 (icc); 1 2 borrow, 0 2 no borrow (see TABLE 5-10). 


TABLE 5-10 Setting of Carry (Borrow) bits for Subtraction That Sets CCs 


Unsigned Comparison of Operand Values Setting of Carry bits in CCR 





R[rs1]{31:0} > R[rs2](31:0! CCR.icc.c + 0 
R[rs1]{31:0} < R[rs2](31:0] CCR.icc.c — 1 
R[rs1]{63:0} > R[rs2](63:0! CCR.xcc.c — 0 
R[rs1](63:0) < R[rs2]{63:0} CCR.xcc.c — 1 


Both fields of CCR (xcc and icc) are modified by arithmetic and logical instructions, 
the names of which end with the letters "cc" (for example, ANDcc), and by the 
WRCCR instruction. They can be modified by a DONE or RETRY instruction, which 
replaces these bits with the contents of TSTATE.ccr. The behavior of the following 
instructions are conditioned by the contents of CCR.icc or CCR.xcc: 


m BPcc and Tcc instructions (conditional transfer of control) 
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m Bicc (conditional transfer of control, based on CCR.icc only) 
m MOVcc instruction (conditionally move the contents of an integer register) 


m FMOVcc instruction (conditionally move the contents of a floating-point register) 


Extended (64-bit) integer condition codes (xcc). Bits 7 through 4 are the IU 
condition codes, which indicate the results of an integer operation, with both of the 
operands and the result considered to be 64 bits wide. 


32-bit Integer condition codes (icc). Bits 3 through 0 are the IU condition codes, 
which indicate the results of an integer operation, with both of the operands and the 
result considered to be 32 bits wide. 


Address Space Identifier (ASI) Register 
(ASR 3) 


The Address Space Identifier register (FIGURE 5-12) specifies the address space 
identifier to be used for load and store alternate instructions that use the "rs1 + 
simm13” addressing form. 


The ASI register may be explicitly read and written by the RDASI and WRASI 
instructions, respectively. 


Software (executing in any privilege mode) may write any value into the ASI 
register. However, values in the range 0046 to 7F46 are "restricted" ASIs; an attempt 
to perform an access using an ASI in that range is restricted to software executing in 
a mode with sufficient privileges for the ASI. When an instruction executing in 
nonprivileged mode attempts an access using an ASI in the range 0046 to 7F46 or an 
instruction executing in privileged mode attempts an access using an ASI the range 
3046 to 7F16, a privileged action exception is generated. See Chapter 10, Address Space 
Identifiers (ASIs) for details. 


FIGURE 5-12 Address Space Identifier Register 


Tick (TICK) Register (ASR 4) 


FIGURE 5-13 illustrates the TICK register. 


74 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


R, WH R, W" 
TICKP^Pt npt (D2) counter 
63 62 
FIGURE 5-13 TICK Register 


The counter field of the TICK register is a 63-bit counter that counts strand clock 
cycles. 


Bit 63 of the TICK register is the nonprivileged trap (npt) bit, which controls 
access to the TICK register by nonprivileged software. 


Hyperprivileged software can always read the TICK register, with either the RDPR 
or RDTICK instruction. 


Hyperprivileged software can always write to the TICK register with the WRPR 
instruction (there is no distinct WRTICK instruction). 


Privileged software can always read the TICK register, with either the RDPR or 
RDTICK instruction. 


Privileged software cannot write to the TICK register; an attempt to do so (with the 
WRPR instruction) results in an illegal instruction exception. 


Nonprivileged software can read the TICK register by using the RDTICK instruction, 
but only when nonprivileged access to TICK is enabled (TICK.npt = 0) by 
hyperprivileged software. If nonprivileged access is disabled (TICK.npt = 1), an 
attempt by nonprivileged software to read the TICK register using the RDTICK 
instruction causes a privileged action exception. 


An attempt by nonprivileged software at any time to read the TICK register using 
the privileged RDPR instruction causes a privileged opcode exception. 


Nonprivileged software cannot write the TICK register. An attempt by nonprivileged 
software to write the TICK register using the privileged WRPR instruction causes a 
privileged opcode exception. 


TICK.npt is set to 1 by a power-on reset trap. The value of TICK.counter is reset to 0 
after a power-on reset trap. 


After the TICK register is written, reading the TICK register returns a value 
incremented (by 1 or more) from the last value written, rather than from some 
previous value of counter. The number of counts between a write and a subsequent 
read does not accurately reflect the number of strand cycles between the write and 
the read. Software may rely only on read-to-read counts of the TICK register for 
accurate timing, not on write-to-read counts. 
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The difference between the values read from the TICK register on two reads is 
intended to reflect the number of strand cycles executed between the reads. 


Programming | If a single TICK register is shared among multiple virtual 
Note | processors, then the difference between subsequent reads of 
TICK.counter reflects a shared cycle count, not a count specific to 
the virtual processor reading the TICK register. 


IMPL. DEP. #105-V9: (a) If an accurate count cannot always be returned when TICK 
is read, any inaccuracy should be small, bounded, and documented. 

(b) An implementation may implement fewer than 63 bits in TICK.counter; however, 
the counter as implemented must be able to count for at least 10 years without 
overflowing. Any upper bits not implemented must read as zero. 


Programming | TICK.npt may be used by a secure operating system to control 
Note | access by nonprivileged software to high-accuracy timing 
information. The operation of the timer might be emulated by 
the trap handler, which could read TICK.counter and “fuzz” the 
value to lower accuracy. 


Program Counters (PC, NPC) (ASR 5) 


The PC contains the address of the instruction currently being executed. The least- 
significant two bits of PC always contain zeroes. 


The PC can be read directly with the RDPC instruction. PC cannot be explicitly 
written by any instruction (including Write State Register), but is implicitly written 
by control transfer instructions. A WRasr to ASR 5 causes an illegal instruction 
exception. 


The Next Program Counter, NPC, is a pseudo-register that contains the address of 
the next instruction to be executed if a trap does not occur. The least-significant two 
bits of NPC always contain zeroes. 


NPC is written implicitly by control transfer instructions. However, NPC cannot be 
read or written explicitly by any instruction. 


PC and NPC can be indirectly set by privileged software that writes to TPC[TL] 
and/or TNPC[TL] and executes a RETRY instruction. 


See Chapter 6, Instruction Set Overview, for details on how PC and NPC are used. 


Floating-Point Registers State (FPRS) Register 
(ASR 6) 


The Floating-Point Registers State (FPRS) register, shown in FIGURE 5-14, contains 
control information for the floating-point register file; this information is readable 
and writable by nonprivileged software. 
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FPRS[erTo Ta] 


2 1 0 
FIGURE 5-14 Floating-Point Registers State Register 


The FPRS register may be explicitly read and written by the RDFPRS and WRFPRS 
instructions, respectively. 


Enable FPU (fef). Bit 2, fef, determines whether the FPU is enabled. If it is 
disabled, executing a floating-point instruction causes an fp disabled trap. If this bit 
is set (FPRS.fef = 1) but the PSTATE.pef bit is not set (PSTATE.pef = 0), then 
executing a floating-point instruction causes an fp disabled exception; that is, both 
FPRS.fef and PSTATE.pef must be set to 1 to enable floating-point operations. 


Programming | FPRS.fef can be used by application software to notify system 

Note | software that the application does not require the contents of the 
F registers to be preserved. Depending on system software, this 
may provide some performance benefit, for example, the F 
registers would not have to be saved or restored during context 
switches to or from that application. Once an application sets 
FPRS.fef to 0, it must assume that the values in all F registers 
are volatile (may change at any time). 





Dirty Upper Registers (du). Bit 1 is the "dirty" bit for the upper half of the 
floating-point registers; that is, F[32]- F[62]. It is set to 1 whenever any of the upper 
floating-point registers is modified. The du bit is cleared only by software. 


IMPL. DEP. #403-S10(a): An UltraSPARC Architecture 2005 virtual processor may 
set FPRS.du pessimistically; that is, it may be set whenever an FPop is issued, even 
though no destination F register is modified. The specific conditions under which a 
dirty bit is set pessimistically are implementation dependent. 


Dirty Lower Registers (dl). Bit 0 is the "dirty" bit for the lower 32 floating-point 
registers; that is, F[0]-F[31]. It is set to 1 whenever any of the lower floating-point 
registers is modified. The dl bit is cleared only by software. 


IMPL. DEP. #403-S10(b): An UltraSPARC Architecture 2005 virtual processor may 
set FPRS.dl pessimistically; that is, it may be set whenever an FPop is issued, even 
though no destination F register is modified. The specific conditions under which a 
dirty bit is set pessimistically are implementation dependent. 


Implementation | If an instruction that normally writes to the F registers is 
Note | executed and causes an fp disabled exception, an UltraSPARC 
Architecture 2005 implementation still sets the "dirty" bit 
(FPRS.du or FPRS.dl) corresponding to the destination register 
to ‘1’. 
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Forward | It is expected that in future revisions to the UltraSPARC 
Compatibility | Architecture, if an instruction that normally writes to the F 
Note | registers is executed and causes an fp disabled exception the 
“dirty” bit (FPRS.du or FPRS.dl) corresponding to the 
destination register will be left unchanged. 


5.5.7 Performance Control Register (PC RP) (ASR 16) 


The PCR is used to control performance monitoring events collected in counter 
pairs, which are accessed via the Performance Instrumentation Counter (PIC) 
register (ASR 17) (see page 79). Unused PCR bits read as zero; they should be 
written only with zeroes or with values previously read from them. 


When the virtual processor is operating in privileged mode (PSTATE.priv = 1 and 
HPSTATE.hpriv = 0)or hyperprivileged mode (HPSTATE.hpriv = 1), PCR may be 
freely read and written by software. 


When the virtual processor is operating in nonprivileged mode (PSTATE.priv = 0), an 
attempt to access PCR (using a RDPCR or WRPCR instruction) results in a 
privileged opcode exception (impl. dep. #250-U3-Cs10). 


The PCR is illustrated in FIGURE 5-15 and described in TABLE 5-11. 


RW RW 
; 7 impl. 
mpl. impl. . su | 
ae ee ee re 
63 48 47 32 31 27 26 17 16 11 10 9 4 3 


FIGURE 5-15 Performance Control Register (PCR) (ASR 16) 


RW RW RW 


2 1 0 


PCRP 





IMPL. DEP. #207-U3: The values and semantics of bits 47:32, 26:17, and bit 3 of the 
PCR are implementation dependent. 


TABLE 5-11 PCR Bit Description 








Bit Field Description 

47:32 — These bits are implementation dependent (impl. dep #207-U3). 

26:17 — These bits are implementation dependent (impl. dep. #207-U3). 

16:11 su Six-bit field selecting 1 of 64 event counts in the upper half (bits {63:32}) of the PIC. 
9:4 sl Six-bit field selecting 1 of 64 event counts in the lower half (bits {31:0}) of the PIC. 
3 — This bit is implementation dependent (impl. dep. #207-U3). 
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TABLE 5-11 
Bit 

2 

1 


5.5.8 


Field 
ut 
st 


priv 


PCR Bit Description (Continued) 


Description 

User Trace Enable. If set to 1, events in nonprivileged (user) mode are counted. 

System Trace Enable. If set to 1, events in privileged (system) mode are counted. 

Notes: 

If both PCR.ut and PCR.st are set to 1, all selected events are counted. 

If both PCR.ut and PCR.st are zero, counting is disabled. 

PCR.ut and PCR.st are global fields which apply to all PIC pairs. 

Privileged. Controls access to the PIC register (via RDPIC or WRPIC instructions). If 
PCR.priv = 0, an attempt to access PIC will succeed regardless of the privilege state 
(PSTATE.priv). If PCR.priv — 1, access to PIC is restricted to privileged software; that is, an 
attempt to access PIC while PSTATE.priv = 1 will succeed, but an attempt to access PIC while 
PSTATE.priv = 0 will result in a privileged action exception. 


Performance Instrumentation Counter (PIC) 
Register (ASR 17) 


PIC contains two 32-bit counters that count performance-related events (such as 
instruction counts, cache misses, TLB misses, and pipeline stalls). Which events are 
actively counted at any given time is selected by the PCR register. 


The difference between the values read from the PIC register at two different times 
reflects the number of events that occurred between register reads. Software can only 
rely on the difference in counts between two PIC reads to get an accurate count, not 
on the difference in counts between a PIC write and a PIC read. 


PIC is normally a nonprivileged-access, read/write register. However, if the priv bit 
of the PCR (ASR 16) is set, attempted access by nonprivileged (user) code causes a 
privileged action exception. 


Multiple PICs may be implemented. Each is accessed through ASR 17, using an 
implementation-dependent PIC pair selection field in PCR (ASR 16) (impl. dep. 
#207-U3). Read / write access to the PIC will access the picu/picl counter pair selected 
by PCR. 


The PIC is described below and illustrated in FIGURE 5-16. 





Bit 
63:32 


31:0 


Field 
picu 


picl 


Description 


32-bit counter representing the count of an event selected by the su field of the 
Performance Control Register (PCR) (ASR 16). 


32-bit counter representing the count of an event selected by the sl field of the Performance 
Control Register (PCR) (ASR 16). 
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RW RW 


ae 


63 32 31 0 
FIGURE 5-16 Performance Instrumentation Counter (PIC) (ASR 17) 


Counter Overflow. On overflow, the effective counter wraps to 0, SOFTINT 
register bit 15 is set to 1, and an interrupt level 15 trap is generated if not masked by 
PSTATE.ie and PIL. The counter overflow trap is triggered on the transition from 
value FFFF FFFF;, to value 0. 


5.5.9 General Status Register (GSR) (ASR 19) 


The General Status Register! (GSR) is a nonprivileged read/write register that is 
implicitly referenced by many VIS instructions. The GSR can be read by the RDGSR 
instruction (see Read Ancillary State Register on page 303) and written by the WRGSR 
instruction (see Write Ancillary State Register on page 376). 


If the FPU is disabled (PSTATE.pef = 0 or FPRS.fef = 0), an attempt to access this 
register using an otherwise-valid RDGSR or WRGSR instruction causes an 
fp disabled trap. 


The GSR is illustrated in FIGURE 5-17 and described in TABLE 5-12. 


RW RW RW RW RW 
ase [m LE I 
63 32 31 28 27 26 2524 8 7 32 0 


FIGURE 5-17 General Status Register (GSR) (ASR 19) 


TABLE 5-12 GSR Bit Description 





Bit Field Description 

63:32 mask This 32-bit field specifies the mask used by the BSHUFFLE instruction. The field — 
contents are set by the BMASK instruction. 

31:28 m Reserved. 

27 im Interval Mode: If GSR.im = 0, rounding is performed according to FSR.rd; if 


GSR.im = 1, rounding is performed according to GSR.irnd. 


1- This register was (inaccurately) referred to as the "Graphics Status Register" in early UltraSPARC 
implementations 
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TABLE 5-12 GSR Bit Description (Continued) 


Bit Field Description 
26:25 imd IEEE Std 754-1985 rounding direction to use in Interval Mode (GSR.im = 1), as follows: 
irnd Round toward ... 
0 Nearest (even, if tie) 
1 0 
2 + co 
3 — oo 
24:8 m Reserved. 
7:3 scale 5-bit shift count in the range 0-31, used by the FPACK instructions for formatting. 
2:0 align Least three significant bits of the address computed by the last-executed 


ALIGNADDRESS or ALIGNADDRESS_LITTLE instruction. 


5.5.10  SOFTINT! Register (ASRs 20 ©, 2162, 22 65) 


Software uses the privileged, read/write SOFTINT register (ASR 22) to schedule 
interrupts (via interrupt_level_n exceptions). 


SOFTINT can be read with a RDSOFTINT instruction (see Read Ancillary State 
Register on page 303) and written with a WRSOFTINT, WRSOFTINT_SET, or 
WRSOFTINT CLR instruction (see Write Ancillary State Register on page 376). An 
attempt to access to this register in nonprivileged mode causes a privileged_opcode 
exception. 


Programming 
Note 


To atomically modify the set of pending software interrupts, use 
of the SOFTINT_SET and SOFTINT_CLR ASRs is 
recommended. 





The SOFTINT register is illustrated in FIGURE 5-18 and described in TABLE 5-13. 


RW RW RW 
63 17 16 15 1 0 


FIGURE 5-18 SOFTINT Register (ASR 22) 
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TABLE 5-13 


Bit 


Field 


SOFTINT Bit Description 


Description 





16 


15:1 


sm 


int_level 


tm 


When the STICK CMPR (ASR 25) register's int dis (interrupt disable) field is 0 (that is, 
System Tick Compare is enabled) and its stick cmpr field matches the value in the 
STICK register, then SOFTINT.sm ("STICK match") is set to 1 and a level 14 interrupt 
(interrupt level 14) is generated. See System Tick Compare ( STICK CMPH" ) Register (ASR 
25) on page 85 for details. SOFTINT.sm can also be directly written to 1 by software. 


When SOFTINT.int_level{n—1} (SOFTINT(1]) is set to 1, an interrupt level n exception is 
generated. 


Notes: |A level-14 interrupt (interrupt level 14) can be triggered by 
SOFTINT.sm, SOFTINT.tm, or a write to SOFTINT.int_level{13} 
(SOFTINT(14]). 


A level-15 interrupt (interrupt level 15) can be triggered by a write to 
SOFTINT.int_level{14} (SOFTINT{15}), or possibly by other 
implementation-dependent mechanisms. 


An interrupt level n exception will only cause a trap if (PIL « 1) and 
(PSTATE.ie = 1) and (HPSTATE.hpriv = 0). 


When the TICK CMPR (ASR 23) register's int dis (interrupt disable) field is 0 (that is, 
Tick Compare is enabled) and its tick cmpr field matches the value in the TICK register, 
then the tm ("TICK match") field in SOFTINT is set to 1 and a level-14 interrupt 
(interrupt level 14) is generated. See Tick Compare (TICK CMPHP) Register (ASR 23) on 
page 83 for details. SOFTINT.tm can also be directly written to 1 by software. 





Setting any of SOFTINT.sm, SOFTINT.int_level{13} (SOFTINT(14]), or SOFTINT.tm 
to 1 causes a level-14 interrupt (interrupt level 14). However, those three bits are 
independent; setting any one of them does not affect the other two. 


See Software Interrupt Register (SOFTINT) on page 512 for additional information 
regarding the SOFTINT register. 


5.5.10.1 SOFTINT SET? Pseudo-Register (ASR 20) 


A Write State register instruction to ASR 20 (WRSOFTINT. SET) atomically sets 

selected bits in the privileged SOFTINT Register (ASR 22) (see page 81). That is, bits 
16:0 of the write data are ored into SOFTINT; any '1' bit in the write data causes the 
corresponding bit of SOFTINT to be set to 1. Bits 63:17 of the write data are ignored. 


Access to ASR 20 is privileged and write-only. There is no instruction to read this 
pseudo-register. An attempt to write to ASR 20 in non-privileged mode, using the 
WkRasr instruction, causes a privileged opcode exception. 


Programming | There is no actual "register" (machine state) corresponding to 
Note | ASR 20; it is just a programming interface to conveniently set 
selected bits to ‘1’ in the SOFTINT register, ASR 22. 
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FIGURE 5-19 illustrates the SOFTINT SET pseudo-register. 


Wis 


0 


63 17 16 
FIGURE 5-19 SOFTINT SET Pseudo-Register (ASR 20) 


5.5.10.2 SOFTINT CLR? Pseudo-Register (ASR 21) 


A Write State register instruction to ASR 21 (WRSOFTINT CLR) atomically clears 
selected bits in the privileged SOFTINT register (ASR 22) (see page 81). That is, bits 
16:0 of the write data are inverted and anded into SOFTINT; any '1' bit in the write 
data causes the corresponding bit of SOFTINT to be set to 0. Bits 63:17 of the write 
data are ignored. 


Access to ASR 21 is privileged and write-only. There is no instruction to read this 
pseudo-register. An attempt to write to ASR 21 in non-privileged mode, using the 
WkRasr instruction, causes a privileged opcode exception. 


There is no actual "register" (machine state) corresponding to 
ASR 21; it is just a programming interface to conveniently clear 
(set to ‘0’) selected bits in the SOFTINT register, ASR 22. 


Programming 
Note 





FIGURE 5-20 illustrates the SOFTINT CLR pseudo-register. 


W1c 


5.5.11 


63 17 16 0 
FIGURE 5-20 SOFTINT CLR Pseudo-Register (ASR 21)) 


Tick Compare (TICK CMPRP) Register (ASR 
23) «2 


The privileged TICK CMPR register allows system software to cause a trap when 
the TICK register reaches a specified value. Nonprivileged accesses to this register 
cause a privileged opcode exception (see Exception and Interrupt Descriptions on page 
497). 


After a power-on reset trap, the int dis bit is set to 1 (disabling Tick Compare 
interrupts) and the value of the tick cmpr field is undefined. 


The TICK  CMPR register is illustrated in FIGURE 5-21 and described in TABLE 5-14. 
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RW RW 


63 62 0 
FIGURE 5-21 TICK_CMPR Register 


TABLE 5-14 TICK CMPR Register Description 





Bit Field Description 


63 int_dis Interrupt Disable. If int_dis = 0, TICK compare interrupts are enabled 
and if int_dis = 1, TICK compare interrupts are disabled. 


62:0 tick_cmpr Tick Compare Field. When this field exactly matches the value in 
TICK.counter and TICK_CMPR.int_dis = 0, SOFTINT.tm is set to 1. 
This has the effect of posting a level-14 interrupt to the virtual 
processor, which causes an interrupt_level_14 trap when (PIL < 14) 
and (PSTATE.ie = 1 and HPSTATE.hpriv = 0). The level-14 interrupt 
handler must check SOFTINT{14}, SOFTINT{0} (tm), and 
SOFTINT{16} (sm) to determine the source of the level-14 interrupt. 


5.5.12 System Tick (STICK) Register (ASR 24) 


The System Tick (STICK) register provides a counter that is synchronized across a 
system, useful for timestamping. The counter field of the STICK register is a 63-bit 
counter that increments at a rate determined by a clock signal external to the 
processor. 


Bit 63 of the STICK register is the nonprivileged trap (npt) bit, which controls 
access to the TICK register by nonprivileged software. 


The STICK register is illustrated in FIGURE 5-22 and described below. 


R,WH RW” 
STICK? "P! npt (D2) counter 
63 62 0 
FIGURE 5-22 STICK Register 


Hyperprivileged software can always read the STICK register with the RDSTICK 
instruction and write it with the WRSTICK instruction. 


Privileged software can always read the STICK register with the RDSTICK 
instruction. 


Privileged software cannot write the STICK register; an attempt to execute the 
WRSTICK instruction in privileged mode results in an illegal instruction exception. 
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5.5.13 


Nonprivileged software can read the STICK register by using the RDSTICK 
instruction, but only when nonprivileged access to STICK is enabled (STICK.npt = 0) 
by hyperprivileged software. If nonprivileged access is disabled (STICK.npt = 1), an 
attempt by nonprivileged software to read the STICK register causes a 

privileged action exception. 


Nonprivileged software cannot write the STICK register; an attempt to execute the 
WRSTICK instruction in nonprivileged mode results in an illegal instruction 
exception. 


After the STICK register is written, reading the STICK register returns a value 
incremented (by 1 or more) from the last value written, rather than from some 
previous value of counter. 


IMPL. DEP. #442-S10: (a) If an accurate count cannot always be returned when 
STICK is read, any inaccuracy should be small, bounded, and documented. 

(b) An implementation may implement fewer than 63 bits in STICK.counter; 
however, the counter as implemented must be able to count for at least 10 years 
without overflowing. Any upper bits not implemented must read as zero. 


After a power-on reset trap, STICK.npt is set to 1 and the value of STICK.counter is 
undefined. 


Note | The STICK register is unaffected by any reset other than a 
power-on reset. 


System Tick Compare (STICK CMPRP) Register 
(ASR 25) 


The privileged STICK CMPR register allows system software to cause a trap when 
the STICK register reaches a specified value. Nonprivileged accesses to this register 
cause a privileged opcode exception (see Exception and Interrupt Descriptions on page 
497). 


After a power-on reset trap, the int dis bit is set to 1 (disabling System Tick Compare 
interrupts), and the stick cmpr field is undefined. 


The System Tick Compare Register is illustrated in FIGURE 5-23 and described in 
TABLE 5-15. 


RW 


RW 


63 62 0 
FIGURE 5-23 STICK CMPR Register 
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TABLE 5-15 STICK CMPR Register Description 





Bit Field Description 
63 int_dis Interrupt Disable. If set to 1, STICK_CMPR interrupts are disabled. 


62:0 stick cmpr System Tick Compare Field. When this field exactly matches 
STICK.counter and STICK CMPR.nt dis = 0, SOFTINT.sm is set to 
1. This has the effect of posting a level-14 interrupt to the virtual 
processor, which causes an interrupt level 14 trap when (PIL « 14) 
and (PSTATE.ie = 1). The level-14 interrupt handler must check 
SOFTINT{14}, SOFTINT{0} (tm), and SOFTINT{16} (sm) to 
determine the source of the level-14 interrupt. 








5.6 


Register- Window PR State Registers 


The state of the register windows is determined by the contents of a set of privileged 
registers. These state registers can be read / written by privileged software using the 
RDPR/WRPR instructions. An attempt by nonprivileged software to execute a 
RDPR or WRPR instruction causes a privileged opcode exception. In addition, these 
registers are modified by instructions related to register windows and are used to 
generate traps that allow supervisor software to spill, fill, and clean register 
windows. 


IMPL. DEP. #126-V9-Ms10: Privileged registers CWP, CANSAVE, CANRESTORE, 
OTHERWIN, and CLEANWIN contain values in the range 0 to N REG WINDOWS — 1. 
An attempt to write a value greater than N REG WINDOWS — 1 to any of these 
registers causes an implementation-dependent value between 0 and 

N REG WINDOWS — 1 (inclusive) to be written to the register. Furthermore, an attempt 
to write a value greater than N REG WINDOWS — 2 violates the register window state 
definition in Register Window State Definition on page 89. 

Although the width of each of these five registers is architecturally 5 bits, the width 
is implementation dependent and shall be between [logo(N REG. WINDOWS) | and 5 
bits, inclusive. If fewer than 5 bits are implemented, the unimplemented upper bits 
shall read as 0 and writes to them shall have no effect. All five registers should have 
the same width. 

For UltraSPARC Architecture 2005 processors, N REG WINDOWS = 8. Therefore, each 
register window state register is implemented with 3 bits, the maximum value for 
CWP and CLEANWIN is 7, and the maximum value for CANSAVE, CANRESTORE, 
and OTHERWIN is 6. When these registers are written by the WRPR instruction, bits 
63:3 of the data written are ignored. 
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5.6.1 


5.6.2 


For details of how the window-management registers are used, see Register Window 


Management Instructions on page 131. 


Programming | CANSAVE, CANRESTORE, OTHERWIN, and CLEANWIN must 

Note | never be set to a value greater than N REG WINDOWS — 2 on an 
UltraSPARC Architecture virtual processor. Setting any of these 
to a value greater than N REG WINDOWS — 2 violates the register 
window state definition in Register Window State Definition on 
page 89. Hardware is not required to enforce this restriction; it is 
up to system software to keep the window state consistent. 


Implementation | A write to any privileged register, including PR state registers, 
Note | may drain the CPU pipeline. 





Current Window Pointer (CWP?) Register (PR 9) 


The privileged CWP register, shown in FIGURE 5-24, is a counter that identifies the 
current window into the array of integer registers. See Register Window Management 
Instructions on page 131 and Chapter 12, Traps, for information on how hardware 
manipulates the CWP register. 


RW RW 


owe? p] 


4 32 0 
FIGURE 5-24 Current Window Pointer Register 


Savable Windows (CANSAVE?) Register (PR 10) 


The privileged CANSAVE register, shown in FIGURE 5-25, contains the number of 
register windows following CWP that are not in use and are, hence, available to be 
allocated by a SAVE instruction without generating a window spill exception. 


RW RW 
CANSAVEP 
4 32 0 


FIGURE 5-25 CANSAVE Register, Figure 5-24, page 88 
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5.6.3 


5.6.4 


5.6.5 


Restorable Windows (CANRESTORE!) Register 
(PR 11) 


The privileged CANRESTORE register, shown in FIGURE 5-26, contains the number of 
register windows preceding CWP that are in use by the current program and can be 
restored (by the RESTORE instruction) without generating a window fill exception. 


RW RW 


4 32 0 
FIGURE 5-26 CANRESTORE Register 


Clean Windows (CLEANWINP) Register (PR 12) 


The privileged CLEANWIN register, shown in FIGURE 5-27, contains the number of 
windows that can be used by the SAVE instruction without causing a clean_window 
exception. 


RW. RW 


4 32 0 
FIGURE 5-27 CLEANWIN Register 


The CLEANWIN register counts the number of register windows that are "clean" 
with respect to the current program; that is, register windows that contain only 
zeroes, valid addresses, or valid data from that program. Registers in these windows 
need not be cleaned before they can be used. The count includes the register 
windows that can be restored (the value in the CANRESTORE register) and the 
register windows following CWP that can be used without cleaning. When a clean 
window is requested (by a SAVE instruction) and none is available, a clean window 
exception occurs to cause the next window to be cleaned. 


Other Windows (OTHERWIN!) Register (PR 13) 


The privileged OTHERWIN register, shown in FIGURE 528, contains the count of 
register windows that will be spilled/filled by a separate set of trap vectors based on 
the contents of WSTATE.other. If OTHERWIN is zero, register windows are spilled/ 
filled by use of trap vectors based on the contents of WSTATE.normal. 
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5.6.6 


5.6.7 


The OTHERWIN register can be used to split the register windows among different 
address spaces and handle spill/fill traps efficiently by use of separate spill/fill 
vectors. 


RW RW 


4 32 0 
FIGURE 5-28 OTHERWIN Register 


Window State (WSTATE?) Register (PR 14) 


The privileged WSTATE register, shown in FIGURE 5-29, specifies bits that are inserted 
into TT[TL]{4:2} on traps caused by window spill and fill exceptions. These bits are 
used to select one of eight different window spill and fill handlers. If OTHERWIN = 0 
at the time a trap is taken because of a window spill or window fill exception, then 
the WSTATE.normal bits are inserted into TT[TL]. Otherwise, the WSTATE.other bits 
are inserted into TT[TL]. See Register Window State Definition, below, for details of the 
semantics of OTHERWIN. 


RW RW 
WSTATE® 
5 32 0 


FIGURE 5-29 WSTATE Register 


Register Window Management 


The state of the register windows is determined by the contents of the set of 
privileged registers described in Register-Window PR State Registers on page 86. 
Those registers are affected by the instructions described in Register Window 
Management Instructions on page 131. Privileged software can read/write these state 
registers directly by using RDPR/WRPR instructions. 


5.6.7.1 Register Window State Definition 


For the state of the register windows to be consistent, the following must always be 
true: 


CANSAVE + CANRESTORE + OTHERWIN = N REG WINDOWS - 2 


FIGURE 5-3 on page 54 shows how the register windows are partitioned to obtain the 
above equation. The partitions are as follows: 


m The current window plus the window that must not be used because it overlaps 
two other valid windows. In FIGURE 5-3, these are windows 0 and 5, respectively. 
They are always present and account for the "2" subtracted from N REG. WINDOWS 
in the right-hand side of the above equation. 
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m Windows that do not have valid contents and that can be used (through a SAVE 
instruction) without causing a spill trap. These windows (windows 1-4 in 
FIGURE 5-3) are counted in CANSAVE. 


m Windows that have valid contents for the current address space and that can be 
used (through the RESTORE instruction) without causing a fill trap. These 
windows (window 7 in FIGURE 5-3) are counted in CANRESTORE. 


m Windows that have valid contents for an address space other than the current 
address space. An attempt to use these windows through a SAVE (RESTORE) 
instruction results in a spill (fill) trap to a separate set of trap vectors, as discussed 
in the following subsection. These windows (window 6 in FIGURE 5-3) are counted 
in OTHERWIN. 


In addition, 
CLEANWIN 2 CANRESTORE 


since CLEANWIN is the sum of CANRESTORE and the number of clean windows 
following CWP. 


For the window-management features of the architecture described in this section to 
be used, the state of the register windows must be kept consistent at all times, except 
within the trap handlers for window spilling, filling, and cleaning. While window 
traps are being handled, the state may be inconsistent. Window spill/fill trap 
handlers should be written so that a nested trap can be taken without destroying 
state. 


Programming | System software is responsible for keeping the state of the 
Note | register windows consistent at all times. Failure to do so will 
cause undefined behavior. For example, CANSAVE, 
CANRESTORE, and OTHERWIN must never be greater than or 
equal to N REG WINDOWS - 1. 


5.6.7.2 Register Window Traps 


Window traps are used to manage overflow and underflow conditions in the register 
windows, support clean windows, and implement the FLUSHW instruction. 


See Register Window Traps on page 506 for a detailed description of how fill, spill, and 
clean window traps support register windowing. 
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9.7 


oe al 


TPC,” 
TPC,” 


TPC, 


TPC MAX n. 





Non-Register- Window PR State 
Registers 


The registers described in this section are visible only to software running in 
privileged or hyperprivileged mode (that is, when PSTATE.priv = 1 or 
HPSTATE.hpriv = 1), and may be accessed with the WRPR and RDPR instructions. 
(An attempt to execute a WRPR or RDPR instruction in nonprivileged mode causes 
a privileged opcode exception.) 


Each virtual processor provides a full set of these state registers. 


Implementation | A write to any privileged register, including PR state registers, 
Note | may drain the CPU pipeline. 





Trap Program Counter (TPC) Register (PR 0) 


The privileged Trap Program Counter register (TPC; FIGURE 5-30) contains the 
program counter (PC) from the previous trap level. There are MAXTL instances of the 
TPC, but only one is accessible at any time. The current value in the TL register 
determines which instance of the TPC[TL] register is accessible. An attempt to read 
or write the TPC register when TL = 0 causes an illegal instruction exception. 


RW R 
pc high62 (PC(63:2) from trap while TL = 0) [00 
pc high62 (PC{63:2} from trap while TL - 1) [o0 | 
pc high62 (PC{63:2} from trap while TL = 2) 


pc high62 (PC{63:2} from trap while TL = MAXTL - 1) 


63 210 
FIGURE 5-30 Trap Program Counter Register Stack 


After a power-on reset, the contents of TPC[1] through TPC[MAXTL] are undefined. 
During normal operation, the value of TPC[n], where n is greater than the current 
trap level (n » TL), is undefined. 


TABLE 5-16 lists the events that cause TPC to be read or written. 
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TABLE 5-16 Events that involve TPC, when executing with TL = n. 








Event Effect 

Trap TPCIn +1] — PC 

RETRY instruction PC — TPC[n] 

RDPR (TPC) R[rd] — TPC[n] 

WRPR (TPC) TPO[n] — value 

Power-on reset (POR) All TPC values are left undefined 





5.7.2 Trap Next PC (TNPC!) Register (PR 1) 


The privileged Trap Next Program Counter register (TNPC; FIGURE 5-30) is the next 
program counter (NPC) from the previous trap level. There are MAXTL instances of 
the TNPC, but only one is accessible at any time. The current value in the TL register 
determines which instance of the TNPC register is accessible. An attempt to read or 
write the TNPC register when TL = 0 causes an illegal instruction exception. 
RW R 
npc_high62 (NPC{63:2} from trap while TL = 0) 


npc high62 (NPC{63:2} from trap while TL = 1) 
npc high62 (NPC{63:2} from trap while TL = 2) 


npc high62 (NPC{63:2} from trap while TL = MAXTL - 1) 


63 210 
FIGURE 5-31 Trap Next Program Counter Register Stack 





After a power-on reset, the contents of TNPC[1] through TNPC[MAX7L] are 
undefined. During normal operation, the value of TNPC[n], where n is greater than 
the current trap level (n » TL), is undefined. 


TABLE 5-17 lists the events that cause TNPC to be read or written. 


TABLE 5-17 Events that involve TNPC, when executing with TL = n. 





Event Effect 

Trap TNPC[n +1] - NPC 

DONE instruction PC €— TNPC[n]; NPC — TNPC[n] +4 
RETRY instruction NPC — TNPO[n] 

RDPR (TNPC) R[rd] — TNPC[n] 

WRPR (TNPC) TNPC[n] < value 

Power-on reset (POR) All TNPC values are left undefined 
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5.7.3 Trap State (TSTATE!) Register (PR 2) 


The privileged Trap State register (T STATE; FIGURE 5-32) contains the state from the 
previous trap level, comprising the contents of the GL, CCR, ASI, CWP, and PSTATE 
registers from the previous trap level. There are MAXTL instances of the TSTATE 
register, but only one is accessible at a time. The current value in the TL register 
determines which instance of TSTATE is accessible. An attempt to read or write the 
TSTATE register when TL = 0 causes an illegal instruction exception. 








RW RW RW R RW R RW 
TSTATE, gl ccl asi — pstate — cwp 
(GL from TL = 0) |(CCR from TL = 0)| (ASI from TL = 0) (PSTATE from TL = 0) (CWP from TL = 0) 
TSTATE," gl ccl asi — pstate — cwp 
(GL from TL = 1) |(CCR from TL = 1)| (ASI from TL = 1 (PSTATE from TL = 1) (CWP from TL = 1) 
TSTATE,"| gl 


ccr asi — pstate cwp 
P (GL from TL = 2) |(CCR from TL = 2)| (ASI from TL =2 (PSTATE from TL = 2) (CWP from TL = 2) 


gl ccr asi pstate cwp 
TSTATE,4Axpu!| (GL from (CCR from (ASI from (PSTATE from (CWP from 
TL = MAXPTL — 1D)|TL = MAXPTL — D)|TL = MAXPTL — 1 TL = MAXPTL — 1) TL = MAXPTL — 1) 

































gl cer asi pstate cwp 
TSTATEyaxeriat’| (GL from (CCR from (ASI from (PSTATE from (CWP from 
a TL = MAXPTL) TL = MAXPTL) TL = MAXPTL) TL = MAXPTL) TL = MAXPTL) 
TSTATE yaxni” gl ccr asi pstate cwp 
(GL from (CCR from (ASI from (PSTATE from (CWP from 
TL = MAXTL — 1) | TL = MAXTL — 1) |TL = MAXTL - 1) TL = MAXTL — 1) TL = MAXTL - 1) 
7 ZU g 7 U S 4 U 
TABLE 5-18 


FIGURE 5-32 Trap State (TSTATE) Register Stack 


After a power-on reset the contents of TSTATE[1] through TSTATE[MAxTL] are 
undefined. During normal operation the value of TSTATE[n], when n is greater than 
the current trap level (n > TL), is undefined. 


V9 Compatibility | Because of the addition of additional bits in the PSTATE register 
Note | in the UltraSPARC Architecture, a 13-bit PSTATE value is stored 
in TSTATE instead of the 10-bit value specified in the SPARC V9 

architecture. 
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TABLE 5-19 lists the events that cause TSTATE to be read or written. 


TABLE 5-19 Events That Involve TSTATE, When Executing with TL =n 





Event Effect 

Trap TSTATE[n + 1] — (registers) 

DONE instruction (registers) — TSTATE[n] 

RETRY instruction (registers) — TSTATE[n] 

RDPR (TSTATE) R[rd] — TSTATE[n] 

WRPR (TSTATE) TSTATE[n] — value 

Power-on reset (POR) All TSTATE values are left undefined 





5.7.4 Trap Type (TT!) Register (PR 3) 


The privileged Trap Type register (TT; see FIGURE 5-33) contains the trap type of the 
trap that caused entry to the current trap level. On a reset trap, the TT register 
contains the trap type of the reset (see TABLE 12-2 on page 460). There are MAXTL 
instances of the TT register, but only one is accessible at a time. The current value in 
the TL register determines which instance of the TT register is accessible. An attempt 
to read or write the TT register when TL = 0 causes an illegal instruction exception. 












RW 
TT,’ Trap type from trap while TL = 0 
TT? Trap type from trap while TL = 1 
= : 
Trier Trap type from trap while TL = MAXPTL — 1 


TT waxer +1 


H| Trap type from trap while TL = MAXPTL 
"a : 
TT xn Trap type from trap while TL = MAXTL — 1 


FIGURE 5-33 Trap Type Register Stack 


After a power-on reset the contents of TT[1] through TT[MAXTL — 1] are undefined 
and TT[MAxTL] = 00136. During normal operation, the value of TT[r], where n is 
greater than the current trap level (n > TL), is undefined. 


TABLE 5-20 lists the events that cause TT to be read or written. 
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5.7.5 


TABLE 5-20 Events that involve TT, when executing with TL = n. 








Event Effect 

Trap TT[n + 1] < (trap type) 

RDPR (TT) R[rd] + TT[n] 

WRPR (TT) TT[n] value 

Power-on reset (POR) TT values TT[1] through TT[MAXTL — 1] are left undefined; 


TT[MAXTL] c— 00146. 


Trap Base Address (TBAP) Register (PR 5) 


The privileged Trap Base Address register (TBA), shown in FIGURE 5-34, provides the 
upper 49 bits (bits 63:15) of the virtual address used to select the trap vector for a 
trap that is to be delivered to privileged mode. The lower 15 bits of the TBA always 
read as zero, and writes to them are ignored. 


RW R 


TBAP tba high49 000 0000 0000 0000 


5.7.6 


PSTATEP 


63 15 14 0 


FIGURE 5-34 Trap Base Address Register 


Details on how the full address for a trap vector is generated, using TBA and other 
state, are provided in Trap-Table Entry Address to Privileged Mode on page 469. 


Processor State (PSTATE?) Register (PR 6) 


The privileged Processor State register (PSTATE), shown in FIGURE 5-35, contains 
control fields for the current state of the virtual processor. There is only one instance 
of the PSTATE register per virtual processor. 


RW RW RW RW RW RW RW RW 
-I T-IeIeISISTIZ 
12 11 10 9 8 7 6 5 4 3 2 1 0 
FIGURE 5-35 PSTATE Field 


Writes to PSTATE are nondelayed; that is, new machine state written to PSTATE is 
visible to the next instruction executed. The privileged RDPR and WRPR 
instructions are used to read and write PSTATE, respectively. 


The following subsections describe the fields of the PSTATE register. 
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Current Little Endian (cle). This bit affects the endianness of data accesses 
performed using an implicit ASI. When PSTATE.cle = 1, all data accesses using an 
implicit ASI are performed in little-endian byte order. When PSTATE.cle = 0, all data 
accesses using an implicit ASI are performed in big-endian byte order. Specific ASIs 
used are shown in TABLE 6-3 on page 122. Note that the endianness of a data access 
may be further affected by TTE.ie used by the MMU. 


Instruction accesses are unaffected by PSTATE.cle and are always performed in big- 
endian byte order. 


Trap Little Endian (tle). "When a trap is taken, the current PSTATE register is 
pushed onto the trap stack. 


During a virtual processor trap to privileged mode, the PSTATE.tle bit is copied into 
PSTATE.cle in the new PSTATE register. This behavior allows system software to 
have a different implicit byte ordering than the current process. Thus, if PSTATE.tle 
is set to 1, data accesses using an implicit ASI in the trap handler are little-endian. 


The original state of PSTATE.cle is restored when the original PSTATE register is 
restored from the trap stack. During a virtual processor trap to hyperprivileged 
mode, the PSTATE.tle bit is not copied into PSTATE.cle of the new PSTATE register 
and is unused. 


Memory Model (mm). This 2-bit field determines the memory model in use by 
the virtual processor. The defined values for an UltraSPARC Architecture virtual 
processor are listed in TABLE 5-21. 


TABLE 5-21 PSTATE.mm Encodings 





mm Value Selected Memory Model 
00 Total Store Order (TSO) 
01 Reserved 
10 Implementation dependent (impl. dep. #113-V9-Ms10) 
11 Implementation dependent (impl. dep. #113-V9-Ms10) 


The current memory model is determined by the value of PSTATE.mm. Software 
should refrain from writing the values 015, 105, or 11; to PSTATE.mm because they 
are implementation-dependent or reserved for future extensions to the architecture, 
and in any case not currently portable across implementations. 


m Total Store Order (TSO) — Loads are ordered with respect to earlier loads. Stores 
are ordered with respect to earlier loads and stores. Thus, loads can bypass earlier 
stores but cannot bypass earlier loads; stores cannot bypass earlier loads or stores. 


IMPL. DEP. #113-V9-Ms10: Whether memory models represented by 
PSTATE.mm = 10, or 11; are supported in an UltraSPARC Architecture processor is 
implementation dependent. If the 10, model is supported, then when 


96 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


PSTATE.mm = 10, the implementation must correctly execute software that adheres 
to the RMO model described in The SPARC Architecture Manual-Version 9. If the 11; 
model is supported, its definition is implementation dependent. 


IMPL. DEP. #119-Ms10: The effect of writing an unimplemented memory model 
designation into PSTATE.mm is implementation dependent. 


SPARC V9 | The PSO memory model described in SPARC V8 and SPARC V9 
Compatibility | architecture specifications was never implemented in a SPARC 
Notes | V9 implementation and is not included in the UltraSPARC 
Architecture specification. 


The RMO memory model described in the SPARC V9 
specification was implemented in some non-Sun SPARC V9 
implementations, but is not directly supported in UltraSPARC 
Architecture 2005 implementations. All software written to run 
correctly under RMO will run correctly under TSO on an 
UltraSPARC Architecture 2005 implementation. 





Enable FPU (pef). When set to 1, the PSTATE.pef bit enables the floating-point 
unit. This allows privileged software to manage the FPU. For the FPU to be usable, 
both PSTATE.pef and FPRS.fef must be set to 1. Otherwise, any floating-point 
instruction that tries to reference the FPU causes an fp disabled trap. 


If an implementation does not contain a hardware FPU, PSTATE.pef always reads as 
0 and writes to it are ignored. 


Address Mask (am). The PSTATE.am bit is provided to allow 32-bit SPARC 
software to run correctly on a 64-bit SPARC V9 processor, by masking out (zeroing) 
bits 63:32 of virtual addresses at appropriate times. 


When PSTATE.am - 0, the full 64 bits of all instruction and data addresses are 
preserved at all times. 


When PSTATE.am - 1, bits 63:32 of instruction and data virtual addresses are 
masked out (treated as 0). 


Programming | It is the responsibility of privileged and hyperprivileged 
Note | software to manage the setting of the PSTATE.am bit, since 
hardware masks virtual addresses when PSTATE.am = 1. 


Misuse of the PSTATE.am bit can result in undesirable behavior. 
PSTATE.am should rot be set to 1 in privileged or 
hyperprivileged mode. 

The PSTATE.am bit should always be set to 1 when 32-bit 
software is executed. 
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Instances in which the more-significant 32 bits of a virtual address are masked 
include: 


Before any data address is sent out of the virtual processor (notably, to the 
memory system, which includes MMU, internal caches, and external caches). 


Before any instruction address is sent out of the virtual processor (notably, to the 
memory system, which includes MMU, internal caches, and external caches) 


When the value of PC is stored to a general-purpose register by a CALL, JMPL, or 
RDPC instruction (closed impl.dep. #125-V9-Cs10) 


When the values of PC and NPC are written to TPC[TL] and TNPC[TL] 
(respectively) during a trap (closed impl.dep. #125-V9-Cs10) 


Before any virtual address is sent to a watchpoint comparator 


Programming | A 64-bit comparison is always used when performing a masked 
Note | watchpoint address comparison with the Instruction or Data VA 
watchpoint register. When PSTATE.am - 1, the more significant 
32 bits of the VA watchpoint register must be zero for a match 
(and resulting trap) to occur. 


When an exception occurs and an address is written to the Data Synchronous 
Fault Address register (D-SFAR) (impl.dep. #241-U3) 


Programming | If a memory access is initiated when PSTATE.am - 1, the 
Note | memory system will only see a 32-bit memory address. 
Therefore, if such a memory access causes an exception or error, 
the memory system will (is only able to) report a 32-bit address 
in the D-SFAR register (64-bit address with the more-significant 
32 bits set to 0). 


When PSTATE.am = 1, the more-significant 32 bits of a virtual address are explicitly 
preserved and not masked out in the following cases: 


When a target address is written to NPC by a control transfer instruction 
Forward 
Compatibility 
Note 


This behavior is expected to change in the next revision of the 
architecture, such that implementations will explicitly mask out 
(not preserve) the more-significant 32 bits, in this case. 





When NPC is incremented to NPC + 4 during execution of an instruction that is 
not a taken control transfer 


Forward | This behavior is expected to change in the next revision of the 
Compatibility | architecture, such that implementations will explicitly mask out 


Note | (not preserve) the more-significant 32 bits, in this case. 
When a WRPR instruction writes to TPC[TL] or TNPC[TL] 
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Programming | Since writes to PSTATE are nondelayed (see page 95), a change 

Note | to PSTATE.am can affect which instruction is executed 
immediately after the write to PSTATE.am. Specifically, if a 
WRPR to the PSTATE register changes the value of PSTATE.am 
from '0' to '1', and NPC{63:32} when the WRPR began execution 
was nonzero, then the next instruction executed after the WRPR 
will be from the address indicated in NPC{31:0} (with the more- 
significant 32 address bits set to zero). 


m When a RDPR instruction reads from TPC[TL] or TNPC[TL] 





If (1) TSTATE[TL].pstate.am = 1 and (2) a DONE or RETRY instruction is executed!, 
it is implementation dependent whether the DONE or RETRY instruction masks 
(zeroes) the more-significant 32 bits of the values it places into PC and NPC (impl. 
dep. 4417-510). 


IMPL. DEP. #443-S10: In hyperprivileged mode, when PSTATE.am = 1 and 
physical addressing is being used, it is implementation-dependent whether the 
more-significant 32 bits of addresses are masked (treated as zero). 


Programming | Because of implementation dependency 5417-510, great care 
Note | must be taken in trap handler software if 
TSTATE[TL].pstate.am = 1 and the trap handler wishes to write 
a nonzero value to the more-significant 32 bits of TPC[TL] or 
TNPCITL]. 


Privileged Mode (priv). When PSTATE.priv = 1 and HPSTATE.hpriv = 0, the 
virtual processor is operating in privileged mode. 


When PSTATE.priv = 0 and HPSTATE.hpriv = 0, the processor is operating in 
nonprivileged mode 


When HPSTATE.hpriv = 1, the virtual processor is operating in hyperprivileged 
mode, independent of the state of PSTATE.priv. Hyperprivileged mode provides a 
superset of the capabilities of privileged mode. 


PSTATE interrupt enable (ie). | PSTATE.ie controls when the virtual processor 
can take traps due to disrupting exceptions (such as interrupts or errors unrelated to 
instruction processing). 


1- which sets PSTATE.am to '1', by restoring the value from TSTATE[TL].pstate.am to PSTATE.am 
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5.7.7 


Outstanding disrupting exceptions that are destined for privileged mode can only 
cause a trap when the virtual processor is in nonprivileged or privileged mode and 
PSTATE.ie =1. At all other times, they are held pending. For more details, see 
Conditioning of Disrupting Traps on page 464. 


Outstanding disrupting exceptions that are destined for hyperprivileged mode can 
only cause a trap when the virtual processor is not in hyperprivileged mode, or 
when it is in hyperprivileged mode and PSTATE.ie = 1. At all other times, they are 
held pending. For more details, see Conditioning of Disrupting Traps on page 464 


SPARC V9 | Since the UltraSPARC Architecture provides a more general 
Compatibility | "alternate globals" facility (through use of the GL register) than 
Note | does SPARC V9, an UltraSPARC Architecture processor does not 
implement the SPARC V9 PSTATE.ag bit. 


Trap Level Register (TLP) (PR 7) 


The privileged Trap Level register (TL; FIGURE 5-36) specifies the current trap level. 
TL = 0 is the normal (nontrap) level of operation. TL > 0 implies that one or more 
traps are being processed. 


FIGURE 5-36 Trap Level Register 


The maximum valid value that the TL register may contain is VAXTL, which is always 
equal to the number of supported trap levels beyond level 0. 


IMPL. DEP. #101-V9-CS10: The architectural parameter MAXPTL is a constant for 
each implementation; its legal values are from 2 to 6 (supporting from 2 to 6 levels of 
saved trap state visible to privileged software). In a typical implementation 

MAXPTL = MAXPGL (see impl. dep. 4401-510). The architectural parameter MAXTL is a 
constant for each implementation; its legal values are from 4 to 7 (supporting from 4 
to 7 levels of saved trap state). Architecturally, MAXPTL must be > 2, MAXTL must be > 
4, and MAXTL must be » MAXPTL. 


In an UltraSPARC Architecture 2005 implementation, MAXPTL = 2 and MAXTL = 6. See 
Chapter 12, Traps, for more details regarding the TL register. 


After a power-on rest (POR), TL is set to MAXTL. 
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The effect of writing to TL with a WRPR instruction is summarized in TABLE 5-22. 


TABLE 5-22 Effect of WRPR of Value x to Register TL 





Privilege Level when Executing WRPR 























Value x Written with WRPR Nonprivileged Privileged Hyperprivileged 
x € MAXPTL TLex 
TLex 
MAXPTL < x < MAXTL privileged_opcode TL © MAXPTL 
exception no exception generated 
x > MAXTL P ( P 8 ) TL © MAXTL 





(no exception generated) 


Writing the TL register with a WRPR instruction does not alter any other machine 
state; that is, it is not equivalent to taking a trap or returning from a trap. 


Programming 
Note 


Implementation 
Note 


Implementation 
Note 


Programming 
Note 





An UltraSPARC Architecture implementation only needs to 
implement sufficient bits in the TL register to encode the 
maximum trap level value. In an implementation 

whereMAXTL <7, bits 63:3 of data written to the TL register using 
the WRPR instruction are ignored; only the least-significant 
three bits (bits 2:0) of TL are actually written. For example, if 
MAXTL = 6, writing a value of 0946 to the TL register causes a 
value of 146 to actually be stored in TL. 


MAXPTL =2 for all UltraSPARC Architecture 2005 processors. 
Writing a value between 3 and 7 to the TL register in privileged 
mode causes a 2 to be stored in TL. 


MAXTL = 6 for all UltraSPARC Architecture 2005 processors. 
Writing a value of 7 to the TL register in hyperprivileged mode 
causes a 6 to be stored in TL. 


Although it is possible for hyperprivileged software to set 

TL > MAXPTL for privileged or nonprivileged software’, an 
UltraSPARC Architecture virtual processor's behavior when 
executing with TL » MAXPTL outside of hyperprivileged mode is 
undefined. 


Although it is possible for privileged or hyperprivileged 
software to set TL » 0 for nonprivileged software', an 
UltraSPARC Architecture virtual processor's behavior when 
executing with TL » 0 in nonprivileged mode is undefined. 


X by executing a WRPR to TSTATE followed by DONE instruction or RETRY 
instruction or a JMPL/WRHPR instruction pair. 
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5.7.8 


5.7.9 


Processor Interrupt Level (PILP) Register (PR 8) 


The privileged Processor Interrupt Level register (PIL; see FIGURE 5-37) specifies the 
interrupt level above which the virtual processor will accept an interrupt level n 
interrupt. Interrupt priorities are mapped so that interrupt level 2 has greater 
priority than interrupt level 1, and so on. See TABLE 12-4 on page 475 for a list of 
exception and interrupt priorities. 


RW 
3 0 


FIGURE 5-37 Processor Interrupt Level Register 


V9 Compatibility | On SPARC V8 processors, the level 15 interrupt is considered to 
Note | be nonmaskable, so it has different semantics from other 
interrupt levels. SPARC V9 processors do not treat a level 15 
interrupt differently from other interrupt levels. 


Global Level Register (GLP) (PR 16) 


The privileged Global Level (GL) register selects which set of global registers is 
visible at any given time. 


FIGURE 5-38 illustrates the Global Level register. 


RW 


0 


2 
FIGURE 5-38 Global Level Register, GL 


When a trap occurs, GL is stored in TSTATE[TL].gl, GL is incremented, and a new set 
of global registers (R[1] through R[7]) becomes visible. A DONE or RETRY 
instruction restores the value of GL from TSTATE[TL]. 


The valid range of values that the GL register may contain is MAXGL, where MAXGL is 
one fewer than the number of global register sets available to the virtual processor. 


IMPL. DEP. #401-S10: The architectural parameter MAXPGL is a constant for each 
implementation; its legal values are from 2 to 7 (supporting from 3 to 8 sets of global 
registers visible to privileged software). In a typical implementation 

MAXPGL = MAXPTL (see impl. dep. #101-V9-CS10). The architectural parameter MAXGL 
is a constant for each implementation; its legal values are from 3 to 7 (supporting 
from 4 to 8 sets of global registers). Architecturally, MAXPGL must be = 2 and MAXGL 
must be » MAXPGL. 
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In all UltraSPARC Architecture 2005 implementations, MAXPGL = 2 and MAXGL = 3. 
MAXGL (impl. dep. #401-S10). 


IMPL. DEP. #400-S10: Although GL is defined as a 3-bit register, an implementation 
may implement any subset of those bits sufficient to encode the values from 0 to 
MAXGL for that implementation. If any bits of GL are not implemented, they read as 
zero and writes to them are ignored. 


GL operates similarly to TL, in that it increments during entry to a trap, but the 
values of GL and TL are independent. That is, TL = n does not imply that GL = n, 
and GL = n does not imply that TL = n. Furthermore, there may be a different total 
number of global levels (register sets) than there are trap levels; that is, MAXTL and 
MAXGL are not necessarily equal. 


The GL register can be accessed directly with the RDPR and WRPR instructions (as 
privileged register number 16). Writing the GL register directly with WRPR will 
change the set of global registers visible to all instructions subsequent to the WRPR. 


In privileged mode, attempting to write a value greater than MAXPGL to the GL 
register causes MAXPGL to be written to GL. 


In hyperprivileged mode, attempting to write a value greater than MAXGL to the GL 
register causes MAXGL to be written to GL. 


When a DONE or RETRY instruction is executed in privileged mode and 
HTSTATE[TL ].hpstate.hpriv = 0 (which will cause the DONE or RETRY to return the 
virtual processor to nonprivileged or privileged mode), the value of GL restored 
from TSTATE[TL] saturates at MAXPGL. That is, if the value in TSTATE[TL].gl is 
greater than MAXPGL, then MAXPGL is substituted and written to GL. This protects 
against non-hyperprivileged software executing with GL > MAXPGL. 


Programming | Although it is possible for hyperprivileged software to set 

Note | GL > MAXPGL for privileged or nonprivileged software’, 
executing with GL » MAXPGL outside of hyperprivileged mode is 
an illegal state and the behavior of a virtual processor in that 
state is undefined. 


t by executing a WRPR that modifies GL, followed by a JMPL/WRHPR instruction 
pair (it is not possible to set GL > MAXPGL using DONE/RETRY) 
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The effect of writing to GL with a WRPR instruction is summarized in TABLE 5-23. 


TABLE 5-23 Effect of WRPR to Register GL 





Privilege Level when WRPR Is Executed 




















Value x Written with WRPR Nonprivileged Privileged Hyperprivileged 
x € MAXPGL GL — x 
GL — x 
MAXPGL « x € MAXGL privileged opcode 
exception GL — MAXPGL 
x > MAXGL GL — MAXGL 


(no exception generated) 
(no exception generated) 











If MAXGL « MAXTL, then there are fewer sets of global registers than trap levels. In this 
case, if a trap occurs while GL = MAXGL, GL will have the same value before the trap 
occurs and in the software that handles the trap. Trap handler software must detect 
this case and safely save any global register before the trap handler writes to it. The 
Hyperprivileged Scratchpad registers (see Privileged Scratchpad Registers 

(ASI SCRATCHPAD) on page 445) may be useful in such cases. 


Programming | An UltraSPARC Architecture implementation only needs to 
Note | implement sufficient bits in the GL register to encode the 
maximum global level value. In an implementation where 
MAXGL < 7, bits 63:3 of data written to the GL register using the 
WRPR instruction are ignored; only the least-significant three 
bits (bits 2:0) are actually written to GL. For example, if 
MAXGL = 7, writing a value of 946 to the TL register causes a 
value of 146 to actually be stored in GL. 





Since TSTATE itself is software-accessible, it is possible that when a DONE or 
RETRY is executed to return from a trap handler, the value of GL restored from 
TSTATE[TL] will be different from that which was saved into TSTATE[TL] when the 
trap occurred. 


During power-on reset (POR), the value of GL is set to MAXGL. During all other 
resets, GL is incremented (the same behavior as TL). 





5.8 


HPR State Registers 


The registers described in this section can be directly accessed with the 
hyperprivileged WRHPR and RDHPR instructions. 
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An attempt to read or write any HPR state register (using RDHPR or WRHPR) in 
privileged or nonprivileged modes (that is, when HPSTATE.hpriv = 0) causes an 
illegal instruction exception. 


5.8.1 Hyperprivileged State (HPSTATE") Register 
(HPR 0) 


The Hyperprivileged State register (HPSTATE), shown in FIGURE 5-38, contains 
hyperprivileged control fields for the virtual processor. There is one instance of the 
HPSTATE register per virtual processor. 


RW RW RW RW 
ll NENNEN ne 
63 12 11 109 6 5 4 32 1 0 

FIGURE 5-38 HPSTATE Fields 


Writing HPSTATE is nondelayed; that is, new machine state written to HPSTATE is 
visible to the next instruction executed. The hyperprivileged RDHPR and WRHPR 
instructions are used to read and write HPSTATE, respectively. 


The following subsections describe the fields contained in the HPSTATE register. 


IMPL. DEP. #408-S10: The contents and semantics of HPSTATE(11] are 
implementation dependent. 


Instruction Breakpoint Enable (ibe). When HPSTATE.ibe = 1, the Instruction 
Breakpoint feature is enabled, allowing an instr breakpoint exception to occur. When 
an instr breakpoint exception trap occurs, the virtual processor sets HPSTATE.ibe to 
0 before entering trap handler software, to guarantee that no additional 

instr breakpoint exception can occur in the instruction breakpoint trap handler 
unless the trap handler explicitly reenables instruction breakpointing by setting 
HPSTATE.ibe to 1. 


RED state (red). When HPSTATE.red is set to 1, the virtual processor is operating 
in RED, state (Reset, Error, and Debug state). See RED state on page 457. The 
virtual processor sets HPSTATE.red when any hardware reset occurs. HPSTATE.red 
is also set to 1 when a trap is taken while TL = (MAXTL — 1). Software can reliably exit 
RED. state by one of two methods: 








1. Execute a DONE or RETRY instruction, which restores the stacked copy of 
HPSTATE and clears HPSTATE.red if it was 0 in the stacked copy. 
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5.8.2 


2. Write a 0 to HPSTATE.red with a WRHPR instruction. 
Programming | Software should not write 0 to HPSTATE.red in the delay slot of 
Note | a DCTI (e.g. JMPL instruction). Exiting RED_state using a 
DONE or RETRY instruction avoids this problem entirely. 





Programming | HPSTATE.hpriv = 0 and HPSTATE.red = 1 is an undefined 
Note | operational state. Therefore, care should be taken never to write 
that combination of values to HPSTATE. 





Hyperprivileged mode (hpriv). When HPSTATE.hpriv = 1, the virtual processor 
is operating in hyperprivileged mode and ignores PSTATE.priv. 


When HPSTATE.hpriv = 0, the processor is operating in privileged or nonprivileged 
mode, as determined by PSTATE.priv. 


See the Programming Note on page 381, recommending that a WRHPR instruction 
that changes HPSTATE.priv never be executed in the delay slot of a DCTI instruction. 


Trap Level Zero trap enable (tlz). When HPSTATE.tlz = 0, generation of 

trap level zero exceptions is disabled. When all three of the following conditions 
exist, a trap level zero exception is generated: 

m HPSTATE.tlz = 1 (generation of trap level zero is enabled) 

m the virtual processor is in nonprivileged or privileged mode (HPSTATE.hpriv = 0) 
m the trap level (TL) register's value is zero (TL = 0) 


Programming | The purpose of trap level zero is to improve efficiency when 

Note | descheduling a virtual processor. When a descheduling event 
occurs and the virtual processor is executing in privileged mode 
at TL » 0, hyperprivileged software can choose to enable the 
trap level zero exception (set HPSTATE.tlz < 1) and return to 
privileged mode, enabling privileged software to complete its 
TL > 0 processing. When privileged code returns to TL = 0, this 
exception enables the hyperprivileged code to regain control 
and deschedule the virtual processor with low overhead. 





Hyperprivileged Trap State (HTSTATE®) Register 
(HPR 1) 


The Hyperprivileged Trap State register (HTSTATE; FIGURE 5-39) contains the 
hyperprivileged state from the previous trap level, comprising the contents of the 
HPSTATE register from the previous trap level. There are MAXTL instances of the 
HTSTATE register, but only one is accessible at a time. The current value in the TL 
register determines which instance of HTSTATE is accessible. 
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HTSTATE; — HPSTATE from TL = 0 
HTSTATE,4 — HPSTATE from TL - 1 


HTSTATE,4 HPSTATE from TL =2 


HTSTATE px! 


HPSTATE from TL = MAXTL — 1 








63 12 11 0 


FIGURE 5-39 Hyperprivileged Trap State Register 


An attempt to read or write the HTSTATE register when TL = 0 causes an 
illegal instruction exception. 


After a power-on reset the contents of HTSTATE[1] through HTSTATE[MAXTL] are 
undefined. During normal operation the value of HTSTATE[n], when n is greater 
than the current trap level (n > TL), is undefined. 


TABLE 5-24 lists the events that cause HTSTATE to be read or written. 


TABLE 5-24 Events that involve HTSTATE, when executing with TL =n. 








Event Effect 

Trap HTSTATE[n + 1]{10:0} — HPSTATE 
DONE instruction HPSTATE + HTSTATE[n]{10:0} 
RETRY instruction HPSTATE + HTSTATE[n]{10:0} 
RDHPR (HTSTATE) R[rd] — HTSTATE[n] 

WRHPR (HTSTATE) HTSTATE[n] < value 

Power-on reset (POR) All HTSTATE values are left undefined 


5.8.3 Hyperprivileged Interrupt Pending (HINTPH) 
Register (HPR 3) 


The hyperprivileged HINTP register provides a mechanism for hyperprivileged 
software to determine that an hstick match interrupt is pending while PSTATE.ie = 0 
and to clear the interrupt without having to first set PSTATE.ie = 1 and take a 
disrupting trap. 
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When HINTP.hsp = 1, a match between STICK and HSTICK CMPR has occurred 
while match interrupt generation was enabled (HSTICK CMPR.int dis = 0, see 
System Tick Compare (S TICK CMPRP) Register (ASR 25) on page 85), causing an 
hstick match exception to be generated. 


Programming | A pending hstick match exception can also be generated if 
Note | software directly writes a '1' to HINTP.hsp. 


When HINTP.hsp = 0, no interrupt is pending due to a match between STICK and 
HSTICK CMPR. Software can clear a pending hstick match interrupt (indicated by 
HINTP.hsp = 1) by writing 0 to HINTP.hsp. 


The format of the HINTP register is illustrated in FIGURE 5-40. 


RW 
63 1 0 


FIGURE 5-40 Hyperprivileged Interrupt Pending (HINTP) Register Format 


5.8.4 Hyperprivileged Trap Base Address (HTBAH) 
Register (HPR 5) 


The Hyperprivileged Trap Base Address register (HTBA), shown in FIGURE 5-41, 
provides the most significant 50 bits (bits 63:14) of the physical address used to 
select the trap vector for a trap that is to be serviced in hyperprivileged mode. The 
least significant 14 bits of HTBA always read as zero, and writes to them are ignored. 


RW R 
HTBA" htba high50 00 0000 0000 0000 
63 14 13 0 


FIGURE 5-41 Hyperprivileged Trap Base Address Register 


Details on how the full address for a trap vector is generated, using HTBA and other 
state, are provided in Trap-Table Entry Address to Hyperprivileged Mode on page 470. 


IMPL. DEP. #406-S10: It is implementation dependent whether all 50 bits of 
HTBA{63:14} are implemented or if only bits n—1:14 are implemented. If the latter, 
writes to bits 63:n are ignored and when HTBA is read, bits 63:1 read as sign- 
extended copies of the most significant implemented bit, HTBA{n — 1}. 


See Chapter 12, Traps, for more details on trap vectors. 
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5.8.5 


EMi O Een Implementation Version 
(HVER ) Register (HPR 6) 


The Hyperprivileged Implementation Version register, shown in FIGURE 5-42, 
specifies the fixed parameters pertaining to a particular processor implementation 
and mask set. The HVER register is read-only, readable by the RDHPR instruction in 
hyperprivileged mode. 


R R R R 


R R 
wes [ner T oe re | — res ner [pem 
87 54 0 


48 47 32 31 2423 19 18 16 15 
FIGURE 5-42 Hyperprivileged Implementation Version Register 


IMPL. DEP. #104-V9: HVER.manuf contains a 16-bit manufacturer code. This field is 
optional and if not present shall read as 0. HVER.manuf may indicate the original 
supplier of a second-sourced processor. It is intended that the contents of 
HVER.manuf track the JEDEC semiconductor manufacturer code as closely as 
possible. If the manufacturer does not have a JEDEC semiconductor manufacturer 
code, SPARC International will assign a value for HVER.manuf. 


IMPL. DEP. #13-V8: HVER.impl uniquely identifies an implementation or class of 
software-compatible implementations of the architecture. Values FFF0,¢—FFFF 6 are 
reserved and are not available for assignment. 


HVER.mask specifies the current mask set revision and is chosen by the 
implementor. It generally increases numerically with successive releases of the 
processor but does not necessarily increase by 1 for consecutive releases. 


Implementation | Conventionally, this field is die-specific, with bits 31:28 
Note | indicating the major mask revision number and bits 27:24 
indicating the minor mask revision number. 


HVER.maxgl contains the maximum number of levels of global register sets 
supported by an implementation (impl. dep. 4401-510), that is, MAXGL, the maximum 
value that the GL register may contain. 


HVER.maxtl contains the maximum number of trap levels supported by an 
implementation (impl. dep. #101-V9-CS10), that is, MAXTL, the maximum value of the 
contents of the TL register. 


HVER.maxwin contains the maximum index number available for use as a valid 
CWP value in an implementation; that is, HVER.maxwin contains the value 
N REG WINDOWS — 1 (impl. dep. #2-V8). 


SPARC V9 | The SPARC V9 VER register was replaced in the UltraSPARC 
Compatibility | Architecture by the hyperprivileged HVER register. 
Note 
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5.8.6 Hyperprivileged System Tick Compare 
(HSTICK CMPRP) Register (HPR 31) 


The Hyperprivileged System Tick Compare (HSTICK CMPR) register allows 
hyperprivileged software to set up so that an hstick match interrupt will occur when 
the STICK register reaches a specified value while HSTICK CMPR.int dis = 0. While 
executing in hyperprivileged mode and PSTATE.ie - 0, the interrupt is masked. 


The Hyperprivileged System Tick Compare Register is illustrated in FIGURE 5-43. 


RW 


RW 


63 62 


0 


FIGURE 5-43 HSTICK CMPR Register 


The fields of HSTICK CMPR are described in TABLE 5-25. 


TABLE 5-25 Bit Description of HSTICK CMPR Register 


Bit(s) Field Name 


Description 





63 int dis 


62:0 hstick cmpr 


Programming 
Note 


Programming 
Note 





If int dis = 0, a match between HSTICK CMPR.hstick cmpr and 
STICK will cause hardware to set HINTP.hsp to 1. If int dis = 1, this 
behavior is disabled; when a match occurs, HINTP.hsp will not be 
changed. 


Hyperprivileged System Tick Compare Field. When 

HSTICK CMPR.int dis = 0 and the value in 

HSTICK CMPR:hstick cmpr exactly matches the value in 
STICK.counter, HINTP.hsp is set to 1. After that, if HINTP.hsp 
remains set to 1, the next time that hyperprivileged interrupts are 
unmasked (HPSTATE.hpriv = 0 or PSTATE.ie = 1), an hstick match 
exception will occur. 


When int dis = 1, an hstick match interrupt can still occur if 
HINTP.hsp is set to 1 by software and the other prerequisite 
conditions for triggering hstick match are met. 


HINTP.hsp must be set to 0 between the time an hstick match 
trap occurs and the hstick match trap handler returns. 
Otherwise, a return from the trap handler could immediately 
trigger another hstick match trap. Refer to implementation- 
specific documentation regarding whether hardware sets 
HINTP.hsp to 0 when the hstick match trap is taken or 
HINTP.hsp must be set to 0 by hyperprivileged software in the 
hstick match trap handler. 
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After a power-on reset trap, the int_dis bit is set to 1 (disabling Hyperprivileged 
System Tick Compare interrupts), and the value of HSTICK CMPR.hstick cmpr is 
undefined. 
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CHAPTER 6 


Instruction Set Overview 





Instructions are fetched by the virtual processor from memory and are executed, 
annulled, or trapped. Instructions are encoded in 4 major formats and partitioned 
into 11 general categories. Instructions are described in the following sections: 


m Instruction Execution on page 113. 


m Instruction Formats on page 114. 
m Instruction Categories on page 115. 





6.1 


Instruction Execution 


The instruction at the memory location specified by the program counter is fetched 
and then executed. Instruction execution may change program-visible virtual 
processor and/or memory state. As a side effect of its execution, new values are 
assigned to the program counter (PC) and the next program counter (NPC). 


An instruction may generate an exception if it encounters some condition that makes 
it impossible to complete normal execution. Such an exception may in turn generate 
a precise trap. Other events may also cause traps: an exception caused by a previous 
instruction (a deferred trap), an interrupt or asynchronous error (a disrupting trap), 
or a reset request (a reset trap). If a trap occurs, control is vectored into a trap table. 
See Chapter 12, Traps, for a detailed description of exception and trap processing. 


If a trap does not occur and the instruction is not a control transfer, the next program 
counter is copied into the PC, and the NPC is incremented by 4 (ignoring arithmetic 
overflow if any). There are two types of control-transfer instructions (CTIs): delayed 
and immediate. For a delayed CTI, at the end of the execution of the instruction, 
NPC is copied into the PC and the target address is copied into NPC. For an 
immediate CTI, at the end of execution, the target is copied to PC and target + 4 is 
copied to NPC. In the SPARC instruction set, many CTIs do not transfer control until 
after a delay of one instruction, hence the term "delayed CTI" (DCTI). Thus, the two 
program counters provide for a delayed-branch execution model. 


113 


For each instruction access and each normal data access, an 8-bit address space 
identifier (ASI) is appended to the 64-bit memory address. Load /store alternate 
instructions (see Address Space Identifiers (ASIs) on page 122) can provide an arbitrary 
ASI with their data addresses or can use the ASI value currently contained in the 
ASI register. 





6.2. Instruction Formats 


Every instruction is encoded in a single 32-bit word. Their most typical 32-bit 
formats formats are shown in FIGURE 6-1. For detailed formats for specific 
instructions, see individual instruction descriptions in the Instructions chapter. 


op = 005: SETHI, Branches, and ILLTRAP 


foo} md | op2 imm22 
Le Le = 
Le Le el ae 
o ao roe T oe peo 





31 30 2928 27 25 24 22 2120 19 18 14 13 0 
op = 01: CALL 

disp30 

31 30 29 0 


Op = 105 or 11): Arithmetic, Logical, Moves, Tcc, Loads, Stores, Prefetch, and Misc 





imm asi 
31 3029 25 24 19 18 14 13 12 5 4 0 


FIGURE 6-1 Summary of Instruction Formats 
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6.3 


6.3.1 


Instruction Categories 


UltraSPARC Architecture instructions can be grouped into the following categories: 


Memory access 

Memory synchronization 
Integer arithmetic 

Control transfer (CTI) 
Conditional moves 
Register window management 
State register access 
Privileged register access 
Floating-point operate 
Implementation dependent 
Reserved 


These categories are described in the following subsections. 


Memory Access Instructions 


Load, store, load-store, and PREFETCH instructions are the only instructions that 
access memory. All of the memory access instructions except CASA, CASXA, and 
Partial Store use either two R registers or an R register and simm13 to calculate a 64- 
bit byte memory address. For example, Compare and Swap uses a single R register 
to specify a 64-bit byte memory address. To this 64-bit address, an ASI is appended 
that encodes address space information. 


The destination field of a memory reference instruction specifies the R or F 
register(s) that supply the data for a store or that receive the data from a load or 
LDSTUB. For SWAP, the destination register identifies the R register to be 
exchanged atomically with the calculated memory location. For Compare and Swap, 
an R register is specified, the value of which is compared with the value in memory 
at the computed address. If the values are equal, then the destination field specifies 
the R register that is to be exchanged atomically with the addressed memory 
location. If the values are unequal, then the destination field specifies the R register 
that is to receive the value at the addressed memory location; in this case, the 
addressed memory location remains unchanged. LDFSR/LDXFSR and STFSR/ 
STXFSR are special load and store instructions that load or store the floating-point 
status register, FSR, instead of acting on an R or F register. 


The destination field of a PREFETCH instruction (fcn) is used to encode the type of 
the prefetch. 
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Memory is byte (8-bit) addressable. Integer load and store instructions support byte, 
halfword (2 bytes), word (4 bytes), and doubleword/extended-word (8 bytes) 
accesses. Floating-point load and store instructions support word, doubleword, and 
quadword memory accesses. LDSTUB accesses bytes, SWAP accesses words, CASA 
accesses words, and CASXA accesses doublewords. The LDTXA (load twin- 
extended-word) instruction accesses a quadword (16 bytes) in memory. Block loads 
and stores access 64-byte aligned data. PREFETCH accesses at least 64 bytes. 


Programming | For some instructions, by use of simm13, any location in the 
Note | lowest or highest 4 Kbytes of an address space can be accessed 
without the use of a register to hold part of the address. 


6.3.1. Memory Alignment Restrictions 


A halfword access must be aligned on a 2-byte boundary, a word access (including 
an instruction fetch) must be aligned on a 4-byte boundary, an extended-word (LDX, 
LDXA, STX, STXA) or integer twin word (LDTW, LDTWA, STTW, STTWA ) access 
must be aligned on an 8-byte boundary,an integer twin-extended-word (LDTXA) 
access must be aligned on a 16-byte boundary, and a Block Load (LDBLOCKF) or 
Store (STBLOCKF) access must be aligned on a 64-byte boundary. 


A floating-point doubleword access (LDDF, LDDFA, STDF, STDFA) should be 
aligned on an 8-byte boundary, but is only required to be aligned on a word (4-byte) 
boundary. A floating-point doubleword access to an address that is 4-byte aligned 
but not 8-byte aligned may result in less efficient and nonatomic access (causes a 
trap and is emulated in software (impl. dep. #109-V9-Cs10)), so 8-byte alignment is 
recommended. 


A floating-point quadword access (LDOF, LDOFA, STOF, STOFA) should be aligned 
on a 16-byte boundary, but is only required to be aligned on a word (4-byte) 
boundary. A floating-point quadword access to an address that is 4-byte or 8-byte 
aligned but not 16-byte aligned may result in less efficient and nonatomic access 
(causes a trap and is emulated in software (impl. dep. #111-V9-Cs10)), so 16-byte 
alignment is recommended. 


An improperly aligned address in a load, store, or load-store instruction causes a 

mem address not aligned exception to occur, with these exceptions: 

a An LDDF or LDDFA instruction accessing an address that is word aligned but not 
doubleword aligned may cause an LDDF. mem adaress not aligned exception 
(impl. dep. #109-V9-Cs10). 

m AnSTDF or STDFA instruction accessing an address that is word aligned but not 
doubleword aligned may cause an STDF mem adaress not aligned exception 
(impl. dep. #110-V9-Cs10). 
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m An LDQF or LDQFA instruction accessing an address that is word aligned but not 
quadword aligned may cause an LDQF mem adaress not aligned exception 
(impl. dep. #111-V9-Cs10a). 


Implementation | Although the architecture provides for the 
Note |DQF mem address not aligned exception, UltraSPARC 
Architecture 2005 implementations do not currently generate it. 


m AnSTOF or STOFA instruction accessing an address that is word aligned but not 
quadword aligned may cause an STQF mem adaress not aligned exception 
(impl. dep. #112-V9-Cs10a). 


Implementation | Although the architecture provides for the 
Note | STQF mem address not aligned exception, UltraSPARC 
Architecture 2005 implementations do not currently generate it. 


6.3.1.2 Addressing Conventions 


An UltraSPARC Architecture virtual processor uses big-endian byte order for all 
instruction accesses and, by default, for data accesses. It is possible to access data in 
little-endian format by use of selected ASIs. It is also possible to change the default 
byte order for implicit data accesses. See Processor State (PSTATE?) Register (PR 6) on 
page 95 for more information.! 


Big-endian Addressing Convention. Within a multiple-byte integer, the byte 
with the smallest address is the most significant; a byte's significance decreases as its 
address increases. The big-endian addressing conventions are described in TABLE 6-1 
and illustrated in FIGURE 6-2. 


TABLE6-1  Big-endian Addressing Conventions 


Term Definition 





byte A load/store byte instruction accesses the addressed byte in both big- and 
little-endian modes. 


halfword For a load/store halfword instruction, two bytes are accessed. The most 
significant byte (bits 15-8) is accessed at the address specified in the 
instruction; the least significant byte (bits 7-0) is accessed at the 
address + 1. 


1- Readers interested in more background information on big- vs. little-endian can also refer to Cohen, D., “On 
Holy Wars and a Plea for Peace," Computer 14:10 (October 1981), pp. 48-54. 
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TABLE6-1  Big-endian Addressing Conventions 


Term Definition 





word For a load/store word instruction, four bytes are accessed. The most 
significant byte (bits 31-24) is accessed at the address specified in the 
instruction; the least significant byte (bits 7-0) is accessed at the 
address + 3. 


doubleword or For a load/store extended or floating-point load /store double instruction, 

extended word eight bytes are accessed. The most significant byte (bits 63:56) is accessed 
at the address specified in the instruction; the least significant byte (bits 
7:0) is accessed at the address + 7. 
For the deprecated integer load /store twin word instructions (LDTW, 
LDTWA', STTW, STTWA), two big-endian words are accessed. The word 
at the address specified in the instruction corresponds to the even register 
specified in the instruction; the word at address + 4 corresponds to the 
following odd-numbered register. 


Note that the LDTXA instruction, which is not an LDTWA operation but does share 
LDTWA's opcode, is not deprecated. 


quadword For a load/store quadword instruction, 16 bytes are accessed. The most 
significant byte (bits 127-120) is accessed at the address specified in the 
instruction; the least significant byte (bits 7-0) is accessed at the 
address + 15. 
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Byte 


Halfword 


Word 


Address 


Address [ 0 } = 


Address { 1:0] 


Doubleword / Address{2:0} 
Extended word 


Quadword 


Address [ 2:0] 


Address [ 3:0 ] 


Address [ 3:0 } 


Address [ 3:0 } 


Address { 3:0 } 


127 


95 


63 


31 


0000 


0100 


1000 


1100 


120 


88 


56 


24 


87 


55 





23 


0001 


0101 


1001 


1101 


112 


80 


48 


16 


FIGURE 6-2 Big-endian Addressing Conventions 


CHAPTER 6 œ Instruction Set Overview 119 





0010 


0110 


1010 


1110 


104 


72 


40 





0011 


0111 


1011 


1111 


Little-endian Addressing Convention. Within a multiple-byte integer, the byte 
with the smallest address is the least significant; a byte's significance increases as its 
address increases. The little-endian addressing conventions are defined in TABLE 6-2 
and illustrated in FIGURE 6-3. 


TABLE6-2  Little-endian Addressing Convention 


Term Definition 


byte A load/store byte instruction accesses the addressed byte in both big- 
and little-endian modes. 


halfword For a load/store halfword instruction, two bytes are accessed. The least 
significant byte (bits 7-0) is accessed at the address specified in the 
instruction; the most significant byte (bits 15-8) is accessed at the 
address + 1. 


word For a load/store word instruction, four bytes are accessed. The least 
significant byte (bits 7-0) is accessed at the address specified in the 
instruction; the most significant byte (bits 31-24) is accessed at the 
address + 3. 


doubleword or For a load/store extended or floating-point load /store double 

extended word instruction, eight bytes are accessed. The least significant byte (bits 7-0) 
is accessed at the address specified in the instruction; the most significant 
byte (bits 63-56) is accessed at the address + 7. 
For the deprecated integer load/store twin word instructions (LDTW, 
LDTWAT, STTW, STTWA), two little-endian words are accessed. The 
word at the address specified in the instruction corresponds to the even 
register in the instruction; the word at the address specified in the 
instruction +4 corresponds to the following odd-numbered register. With 
respect to little-endian memory, an LDTW/LDTWA (STTW/STTWA) 
instruction behaves as if it is composed of two 32-bit loads (stores), each 
of which is byte-swapped independently before being written into each 
destination register (memory word). 


*Note that the LDTXA instruction, which is not an LDTWA operation but does share 
LDTWA's opcode, is not deprecated. 


quadword For a load/store quadword instruction, 16 bytes are accessed. The least 
significant byte (bits 7-0) is accessed at the address specified in the 
instruction; the most significant byte (bits 127—120) is accessed at the 
address + 15. 
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Byte 
Address 


Halfword 
Address{0} = 


Word 
Address{1:0} = 


Doubleword / Address{2:0} = 
Extended word 


Address{2:0} = 


Quadword 
Address{3:0} = 


Address{3:0} = 


Address{3:0} = 


Address{3:0} = 


100 

39 

0000 
7 

0100 
39 

1000 
71 

1100 
103 


32 


64 


96 





47 


79 


111 


01 10 11 


001 010 011 


101 110 111 


40| 55 48| 63 

0001 0010 0011 
8| 23 16131 

0101 0110 0111 
40| 55 48| 63 

1001 1010 1011 


1101 1110 1111 
104| 119 112| 127 


FIGURE 6-3 Little-endian Addressing Conventions 
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24 


56 


6.3.1.3 Address Space Identifiers (ASIs) 


Alternate-space load, store, and load-store instructions specify an explicit ASI to use 
for their data access; when i = 0, the explicit ASI is provided in the instruction's 
imm asi field, and when i = 1, it is provided in the ASI register. 


Non-alternate-space load, store, and load-store instructions use an implicit ASI value 
that depends on the current trap level (TL) and the value of PSTATE.cle. Instruction 
fetches use an implicit ASI that depends only on the current trap level. The cases are 
enumerated in TABLE 6-3. Note that in hyperprivileged mode, all accesses are performed 
using physical addresses, so there is no implicit ASI in hyperprivileged mode. 


TABLE6-3 ASIs Used for Data Accesses and Instruction Fetches in Nonprivileged and 
Privileged Modes 


























Access Type TL  PSTATE.cle ASI Used 
Instruction Fetch = 0 any ASI_PRIMARY 

>0 any ASI NUCLEUS* 
Non-alternate-space -0 0 ASI PRIMARY 
Load; Store, or 1 ASI_PRIMARY_LITTLE 
Load-Store 

>0 0 ASI_NUCLEUS* 

1 ASI NUCLEUS LITTLE** 

Alternate-space Load, any any ASI explicitly specified in the instruction 
Store, or Load-Store (subject to privilege-level restrictions) 


*On some early SPARC V9 implementations, AST_PRIMARY may have been used for this case. 
*"On some early SPARC V9 implementations, ASI, PRIMARY LITTLE may have been used for this case. 
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See also Memory Addressing and Alternate Address Spaces on page 403. 


ASIs 00:6-7F36 are restricted; only software with sufficient privilege is allowed to 
access them. ASIs 001¢-2F16 are accessible by both privileged and hyperprivileged 
software, while ASIs 3016-7F16 are accessible only by hyperprivileged software. An 
attempt to access a restricted ASI by insufficiently-privileged software results in a 
privileged action exception (impl. dep #103-V9-Ms10(6)). ASIs 8016 through FF1¢ are 
unrestricted; software is allowed to access them regardless of the virtual processor's 


privilege mode, as summarized in TABLE 6-4. 





TABLE 6-4 Allowed Accesses to ASIs 
Processor Mode 
(HPSTATE.hpriv, 

Value Access Type PSTATE.priv) Result of ASI Access 
0016-2F16 Restricted Nonprivileged (0,0) privileged_action exception 
(Privileged) Privileged (0,1) Valid access 

Hyperprivileged (Lx) Valid access 
3016-7F 16 Restricted Nonprivileged (0,0) privileged_action exception 
(Hyperprivileged) Privileged (0,1) data_access_exception exception 
Hyperprivileged (Lx) Valid access 
8016-FF16 Unrestricted Nonprivileged (0,0) Valid access 


Privileged (0,1) Valid access 


Hyperprivileged (Lx) Valid access 


IMPL. DEP. #29-V8: Some UltraSPARC Architecture 2005 ASIs are implementation 
dependent. See TABLE 10-1 on page 423 for details. 


V9 Compatibility | In SPARC V9, many ASIs were defined to be implementation 
Note | dependent. 


An UItraSPARC Architecture implementation decodes all 8 bits of ASI specifiers 
(impl. dep. #30-V8-Cu3). 


V9 Compatibility 
Note 


In SPARC V9, an implementation could choose to decode only a 
subset of the 8-bit ASI specifier. 





6.3.1.4 Separate Instruction Memory 


A SPARC V9 implementation may choose to access instruction and data through the 
same address space and use hardware to keep data and instruction memory 
consistent at all times. It may also choose to overload independent address spaces 
for data and instructions and allow them to become inconsistent when data writes 
are made to addresses shared with the instruction space. 
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6.3.2 


6.3.3 


Programming | A SPARC V9 program containing self-modifying code should 
Note | use FLUSH instruction(s) after executing stores to modify 
instruction memory and before executing the modified 
instruction(s), to ensure the consistency of program execution. 


Memory Synchronization Instructions 


Two forms of memory barrier (MEMBAR) instructions allow programs to manage 
the order and completion of memory references. Ordering MEMBARs induce a 
partial ordering between sets of loads and stores and future loads and stores. 
Sequencing MEMBARs exert explicit control over completion of loads and stores (or 
other instructions). Both barrier forms are encoded in a single instruction, with 
subfunctions bit-encoded in cmask and mmask fields. 


Integer Arithmetic and Logical Instructions 


The integer arithmetic and logical instructions generally compute a result that is a 
function of two source operands and either write the result in a third (destination) 
register R[rd] or discard it. The first source operand is R[rs1]. The second source 
operand depends on the i bit in the instruction; if i = 0, then the second operand is 
R[rs2]; if i = 1, then the second operand is the constant simm10, simm11, or simm13 
from the instruction itself, sign-extended to 64 bits. 


Note | The value of R[0] always reads as zero, and writes to it are 
ignored. 


6.8.8.1 Setting Condition Codes 


Most integer arithmetic instructions have two versions: one sets the integer 
condition codes (icc and xcc) as a side effect; the other does not affect the condition 
codes. A special comparison instruction for integer values is not needed since it is 
easily synthesized with the “subtract and set condition codes" (SUBcc) instruction. 
See Synthetic Instructions on page 616 for details. 


6.3.8.2 Shift Instructions 


Shift instructions shift an R register left or right by a constant or variable amount. 
None of the shift instructions change the condition codes. 
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6.3.4 


6.3.3.3 Set High 22 Bits of Low Word 


The "set high 22 bits of low word of an R register" instruction (SETHI) writes a 22- 
bit constant from the instruction into bits 31 through 10 of the destination register. It 
clears the low-order 10 bits and high-order 32 bits, and it does not affect the 
condition codes. Its primary use is to construct constants in registers. 


6.3.3.4 Integer Multiply/Divide 


The integer multiply instruction performs a 64 x 64 — 64-bit operation; the integer 
divide instructions perform 64 + 64 — 64-bit operations. For compatibility with 
SPARC V8 processors, 32 x 32 — 64-bit multiply instructions, 64 + 32 — 32-bit divide 
instructions, and the Multiply Step instruction are provided. Division by zero causes 
a division_by_zero exception. 


6.3.3.5 Tagged Add/Subtract 


The tagged add/subtract instructions assume tagged-format data, in which the tag is 
the two low-order bits of each operand. If either of the two operands has a nonzero 
tag or if 32-bit arithmetic overflow occurs, tag overflow is detected. If tag overflow 
occurs, then TADDcc and TSUBcc set the CCR.icc.v bit; if 64-bit arithmetic overflow 
occurs, then they set the CCR.xcc.v bit. 


The trapping versions (TADDccTV, TSUBccTV) of these instructions are deprecated. 
See Tagged Add on page 363 and Tagged Subtract on page 369 for details. 


Control-Transfer Instructions (CTIs) 


The basic control-transfer instruction types are as follows: 


Conditional branch (Bicc, BPcc, BPr, FBfcc, FBPfcc) 
Unconditional branch 

Call and link (CALL) 

Jump and link (MPL, RETURN) 

Return from trap (DONE, RETRY) 

Trap (Tec) 


A control-transfer instruction functions by changing the value of the next program 
counter (NPC) or by changing the value of both the program counter (PC) and the 
next program counter (NPC). When only NPC is changed, the effect of the transfer of 
control is delayed by one instruction. Most control transfers are of the delayed 
variety. The instruction following a delayed control-transfer instruction is said to be 
in the delay slot of the control-transfer instruction. 
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Some control transfer instructions (branches) can optionally annul, that is, not 
execute, the instruction in the delay slot, based on the setting of an annul bit in the 
instruction. The effect of the annul bit depends upon whether the transfer is taken 
or not taken and whether the branch is conditional or unconditional. Annulled 
delay instructions neither affect the program-visible state, nor can they cause a trap. 


Programming 
Note 





The annul bit increases the likelihood that a compiler can find a 
useful instruction to fill the delay slot after a branch, thereby 
reducing the number of instructions executed by a program. For 
example, the annul bit can be used to move an instruction from 
within a loop to fill the delay slot of the branch that closes the 
loop. 


Likewise, the annul bit can be used to move an instruction from 
either the "else" or "then" branch of an "if-then-else" program 
block to the delay slot of the branch that selects between them. 
Since a full set of conditions is provided, a compiler can arrange 
the code (possibly reversing the sense of the condition) so that 
an instruction from either the "else" branch or the "then" branch 
can be moved to the delay slot. Use of annulled branches 
provided some benefit in older, single-issue SPARC 
implementations. On an UltraSPARC Architecture 
implementation, the only benefit of annulled branches might be 
a slight reduction in code size. Therefore, the use of annulled 
branch instructions is no longer encouraged. 


TABLE 6-5 defines the value of the program counter and the value of the next 
program counter after execution of each instruction. Conditional branches have two 
forms: branches that test a condition (including branch-on-register), represented in 
the table by Bcc, and branches that are unconditional, that is, always or never taken, 
represented in the table by BA and BN, respectively. The effect of an annulled branch 
is shown in the table through explicit transfers of control, rather than by fetching 
and annulling the instruction. 


TABLE 6-5 Control-Transfer Characteristics (1 of 2) 











Instruction Group Address Form Delayed Taken Annul Bit New PC New NPC 
Non-CTIs — NPC NPC «4 
Bcc PC-relative Yes Yes 0 NPC EA 

Bcc PC-relative Yes No 0 NPC NPC «4 
Bcc PC-relative Yes Yes 1 NPC EA 

Bcc PC-relative Yes No 1 NPC «4 NPC +8 
BA PC-relative Yes Yes 0 NPC EA 

BA PC-relative No Yes 1 EA EA +4 
BN PC-relative Yes No 0 NPC NPC +4 
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TABLE 6-5 Control-Transfer Characteristics (Continued) (2 of 2) 





Instruction Group 
BN 

CALL 

JMPL, RETURN 
DONE 

RETRY 

Tec 


Tec 


Address Form Delayed Taken Annul Bit New PC New NPC 
PC-relative Yes No 1 NPC +4 NPC +8 
PC-relative Yes — — NPC EA 
Register-indirect Yes — — NPC EA 

Trap state No — — TNPC[TL] TNPCITL] + 4 
Trap state No — — TPC[TL] TNPC[TL] 
Trap vector No Yes — EA EA +4 

Trap vector No No — NPC NPC «4 





The effective address, "EA" in TABLE 6-5, specifies the target of the control-transfer 
instruction. The effective address is computed in different ways, depending on the 
particular instruction. 


m PC-relative effective address — A PC-relative effective address is computed by 


sign extending the instruction's immediate field to 64-bits, left-shifting the word 
displacement by 2 bits to create a byte displacement, and adding the result to the 
contents of the PC. 


Register-indirect effective address — A register-indirect effective address 
computes its target address as either R[rs1]  R[rs2] if i = 0, or 

R[rs1] + sign ext(simm13) if i = 1. 

Trap vector effective address — A trap vector effective address first computes the 
software trap number as the least significant 7 or 8 bits of R[rs1]  R[rs2] if 

i =0, or as the least significant 7 or 8 bits of R[rs1] + imm_trap# ifi = 1. Whether 
7 or 8 bits are used depends on the privilege level — 7 bits are used in 
nonprivileged mode and 8 bits are used in privileged and hyperprivileged modes. 
The trap level, TL, is incremented. The hardware trap type is computed as 256 + 
the software trap number and stored in TT[TL]. The effective address is generated 
by combining the contents of the TBA register with the trap type and other data; 
see Trap Processing on page 486 for details. 


Trap state effective address — A trap state effective address is not computed but 
is taken directly from either TPC[TL] or TNPC[TL]. 


SPARC V8 | The SPARC V8 architecture specified that the delay instruction 
Compatibility | was always fetched, even if annulled, and that an annulled 
Note | instruction could not cause any traps. The SPARC V9 
architecture does not require the delay instruction to be fetched 
if it is annulled. 


6.3.4.1 Conditional Branches 


A conditional branch transfers control if the specified condition is TRUE. If the annul 
bit is 0, the instruction in the delay slot is always executed. If the annul bit is 1, the 
instruction in the delay slot is executed only when the conditional branch is taken. 
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Note | The annuling behavior of a taken conditional branch is different 
from that of an unconditional branch. 


6.3.4.2 Unconditional Branches 


An unconditional branch transfers control unconditionally if its specified condition 
is “always”; it never transfers control if its specified condition is “never.” If the 
annul bit is 0, then the instruction in the delay slot is always executed. If the annul 
bit is 1, then the instruction in the delay slot is never executed. 


Note | The annul behavior of an unconditional branch is different from 
that of a taken conditional branch. 


6.3.4.3 CALL and JMPL Instructions 


The CALL instruction writes the contents of the PC, which points to the CALL 
instruction itself, into R[15] (out register 7) and then causes a delayed transfer of 
control to a PC-relative effective address. The value written into R[15] is visible to 
the instruction in the delay slot. 


The JMPL instruction writes the contents of the PC, which points to the JMPL 
instruction itself, into R[rd] and then causes a register-indirect delayed transfer of 
control to the address given by "R[rs1] + R[rs2]" or “Rfrsi]+ a signed immediate 
value.” The value written into R[rd] is visible to the instruction in the delay slot. 


When PSTATE.am = 1, the value of the high-order 32 bits transmitted to R[15] by the 
CALL instruction or to R[rd] by the JMPL instruction is zero. 


6.3.4.4 RETURN Instruction 


The RETURN instruction is used to return from a trap handler executing in 
nonprivileged mode. RETURN combines the control-transfer characteristics of a 
JMPL instruction with R[0] specified as the destination register and the register- 
window semantics of a RESTORE instruction. 


6.3.4.5 DONE and RETRY Instructions 


The DONE and RETRY instructions are used by privileged software to return from a 
trap. These instructions restore the machine state to values saved in the TSTATE 
register stack. 


RETRY returns to the instruction that caused the trap in order to reexecute it. DONE 
returns to the instruction pointed to by the value of NPC associated with the 
instruction that caused the trap, that is, the next logical instruction in the program. 
DONE presumes that the trap handler did whatever was requested by the program 
and that execution should continue. 
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6.3.5 


6.3.4.6 Trap Instruction (Tecc) 


The Tcc instruction initiates a trap if the condition specified by its cond field matches 
the current state of the condition code register specified in its cc field; otherwise, it 
executes as a NOP. If the trap is taken, it increments the TL register, computes a trap 
type that is stored in TT[TL], and transfers to a computed address in a trap table 
pointed to by a trap base address register. 


A Tce instruction can specify one of 256 software trap types (128 when in 
nonprivileged mode). When a Tcc is taken, 256 plus the 7 (in nonprivileged mode) or 
8 (in privileged or hyperprivileged mode) least significant bits of the Tcc's second 
source operand are written to TT[TL]. The only visible difference between a software 
trap generated by a Tcc instruction and a hardware trap is the trap number in the TT 
register. See Chapter 12, Traps, for more information. 


Programming | Tcc can be used to implement breakpointing, tracing, and calls 
Note | to privileged or hyperprivileged software. Tcc can also be used 
for runtime checks, such as out-of-range array index checks or 

integer overflow checks. 


6.3.4.7 DCTI Couples (E2 


A delayed control transfer instruction (DCTI) in the delay slot of another DCTI is 
referred to as a "DCTI couple". The use of DCTI couples is deprecated in the 
UltraSPARC Architecture; no new software should place a DCTI in the delay slot of 
another DCTI, because on future UltraSPARC Architecture implementations DCTI 
couples may execute either slowly or differently than the programmer assumes it 
will. 


SPARC V8 and | The SPARC V8 architecture left behavior undefined for a DCTI 
SPARC V9 | couple. The SPARC V9 architecture defined behavior in that 
Compatibility | case, but as of UItraSPARC Architecture 2005, use of DCTI couples 
Note | is deprecated. 


Conditional Move Instructions 


This subsection describes two groups of instructions that copy or move the contents 
of any integer or floating-point register. 


MOV «cc and FMOVcc Instructions. The MOVcc and FMOVcc instructions copy 
the contents of any integer or floating-point register to a destination integer or 
floating-point register if a condition is satisfied. The condition to test is specified in 
the instruction and can be any of the conditions allowed in conditional delayed 
control-transfer instructions. This condition is tested against one of the six sets of 
condition codes (icc, xcc, fec0, fcc1, fcc2, and fcc3), as specified by the instruction. 
For example: 
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fmovdg Sfcc2, $f20, $f22 


moves the contents of the double-precision floating-point register $f£20 to register 
$ £22 if floating-point condition code number 2 (fcc2) indicates a greater-than 
relation (FSR.fcc2 = 2). If fcc2 does not indicate a greater-than relation 

(FSR.fcc2 # 2), then the move is not performed. 


The MOVcc and FMOVcc instructions can be used to eliminate some branches in 
programs. In most implementations, branches will be more expensive than the 
MOVcc or FMOVcc instructions. For example, the C statement: 


if (A > B) X = 1; lse X = 0; 





can be coded as 


cmp $i0, $i2 ! (A > B) 
or $g0, 0, %13 ! set X = 0 
movg $xcc, 1, $i3 ! overwrite X with 1 if À > B 


to eliminate the need for a branch. 


MOVr and FMOVr Instructions. The MOVr and FMOVr instructions allow the 
contents of any integer or floating-point register to be moved to a destination integer 
or floating-point register if the contents of a register satisfy a specified condition. 
The conditions to test are enumerated in TABLE 6-6. 


TABLE 6-6 MOVr and FMOVr Test Conditions 





Condition Description 

NZ Nonzero 

Z Zero 

GEZ Greater than or equal to zero 
LZ Less than zero 

LEZ Less than or equal to zero 
GZ Greater than zero 


Any of the integer registers (treated as a signed value) may be tested for one of the 
conditions, and the result used to control the move. For example, 


movrnz $12, $14, $16 


moves integer register $14 to integer register $16 if integer register $i2 contains a 
nonzero value. 


MOVr and FMOVr can be used to eliminate some branches in programs or can 
emulate multiple unsigned condition codes by using an integer register to hold the 
result of a comparison. 
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6.3.6 


Register Window Management Instructions 


This subsection describes the instructions that manage register windows in the 
UltraSPARC Architecture. The privileged registers affected by these instructions are 
described in Register-Window PR State Registers on page 86. 


6.3.6.1 SAVE Instruction 


The SAVE instruction allocates a new register window and saves the caller's register 
window by incrementing the CWP register. 


If CANSAVE = 0, then execution of a SAVE instruction causes a window spill 
exception, that is, one of the spill! n «normall other» exceptions. 


If CANSAVE z 0 but the number of clean windows is zero, that is, 
(CLEANWIN — CANRESTORE) = 0, then SAVE causes a clean window exception. 


If SAVE does not cause an exception, it performs an ADD operation, decrements 
CANSAVE, and increments CANRESTORE. The source registers for the ADD 
operation are from the old window (the one to which CWP pointed before the 
SAVE), while the result is written into a register in the new window (the one to 
which the incremented CWP points). 


6.3.6(2 RESTORE Instruction 


The RESTORE instruction restores the previous register window by decrementing 
the CWP register. 


If CANRESTORE = 0, execution of a RESTORE instruction causes a window fill 
exception, that is, one of the fil! n «normall other» exceptions. 


If RESTORE does not cause an exception, it performs an ADD operation, decrements 
CANRESTORE, and increments CANSAVE. The source registers for the ADD are 
from the old window (the one to which CWP pointed before the RESTORE), and the 
result is written into a register in the new window (the one to which the 
decremented CWP points). 
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Programming | This note describes a common convention for use of register 
Note | windows, SAVE, RESTORE, CALL, and JMPL instructions. 


A procedure is invoked by execution of a CALL (or a JMPL) 
instruction. If the procedure requires a register window, it 
executes a SAVE instruction in its prologue code. A routine that 
does not allocate a register window of its own (possibly a leaf 
procedure) should not modify any windowed registers except 
out registers 0 through 6. This optimization, called "Leaf- 
Procedure Optimization", is routinely performed by SPARC 
compilers. 


A procedure that uses a register window returns by executing 
both a RESTORE and a JMPL instruction. A procedure that has 
not allocated a register window returns by executing a JMPL 
only. The target address for the JMPL instruction is normally 8 
plus the address saved by the calling instruction, that is, the 
instruction after the instruction in the delay slot of the calling 
instruction. 


The SAVE and RESTORE instructions can be used to atomically 
establish a new memory stack pointer in an R register and 
switch to a new or previous register window. 





6.3.6(3 | SAVED Instruction 


SAVED is a privileged instruction used by a spill trap handler to indicate that a 
window spill has completed successfully. It increments CANSAVE and decrements 
either OTHERWIN or CANRESTORE, depending on the conditions at the time 
SAVED is executed. 


See SAVED on page 319 for details. 


6.3.6.4 RESTORED Instruction 


RESTORED is a privileged instruction, used by a fill trap handler to indicate that a 
window has been filled successfully. It increments CANRESTORE and decrements 
either OTHERWIN or CANSAVE, depending on the conditions at the time 
RESTORED is executed. RESTORED also manipulates CLEANWIN, which is used to 
ensure that no address space’s data become visible to another address space through 
windowed registers. 


See RESTORED on page 311 for details. 
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6.3.7 


6.3.8 


6.3.9 


6.3.6.5 Flush Windows Instruction 


The FLUSHW instruction flushes all of the register windows, except the current 
window, by performing repetitive spill traps. The FLUSHW instruction causes a spill 
trap if any register window (other than the current window) has valid contents. The 
number of windows with valid contents is computed as: 


N REG WINDOWS — 2 — CANSAVE 


If this number is nonzero, the FLUSHW instruction causes a spill trap. Otherwise, 
FLUSHW has no effect. If the spill trap handler exits with a RETRY instruction, the 
FLUSHW instruction continues causing spill traps until all the register windows 
except the current window have been flushed. 


Ancillary State Register (ASR) Access 


The read/write state register instructions access program-visible state and status 
registers. These instructions read/write the state registers into/from R registers. A 
read/write Ancillary State register instruction is privileged only if the accessed 
register is privileged. 


The supported RDasr and WRasr instructions are described in Ancillary State 
Registers on page 70. 


Privileged Register Access 


The read/write privileged register instructions access state and status registers that 
are visible only to privileged software. These instructions read/write privileged 
registers into/from R registers. The read/write privileged register instructions are 
privileged. 


Floating-Point Operate (FPop) Instructions 


Floating-point operate instructions (FPops) compute a result that is a function of one 
or two source operands and place the result in one or more destination F registers, 

with one exception: floating-point compare operations do not write to an F register 
but instead update one of the fccn fields of the FSR. 


The term "FPop" refers to instructions in the FPop1, and FPop2 opcode spaces. FPop 
instructions do not include FBfcc instructions, loads and stores between memory 
and the F registers, or non-floating-point operations that read or write F registers. 


The FMOVcc instructions function for the floating-point registers as the MOVcc 
instructions do for the integer registers. See MOVcc and FMOVcc Instructions on page 
129. 
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6.3.10 


6.3.11 


The FMOVr instructions function for the floating-point registers as the MOVr 
instructions do for the integer registers. See MOVr and FMOVr Instructions on page 
130. 


If no floating-point unit is present or if PSTATE.pef = 0 or FPRS.fef = 0, then any 
instruction, including an FPop instruction, that attempts to access an FPU register 
generates an fp disabled exception. 


All FPop instructions clear the ftt field and set the cexc field unless they generate an 
exception. Floating-point compare instructions also write one of the fccn fields. AII 
FPop instructions that can generate IEEE exceptions set the cexc and aexc fields 
unless they generate an exception. FABS«sldlq», FMOV«slIdlq», 

FMOVce<s |d |q>, FMOVr<s |d | q>, and ENEG«sldlq» cannot generate IEEE 
exceptions, so they clear cexc and leave aexc unchanged. 


IMPL. DEP. #3-V8: An implementation may indicate that a floating-point instruction 
did not produce a correct IEEE Std 754-1985 result by generating an 

fp exception other exception with FSR.ftt = unfinished FPop or 

FSR.ftt = unimplemented FPop. In this case, software running in a mode with 
greater privileges must emulate any functionality not present in the hardware. 


See ftt 2 2 (unfinished FPop) on page 65 to see which instructions can produce an 
fp exception other exception (with FSR.ftt = unfinished FPop). See ftt = 3 
(unimplemented_FPop) on page 65 to see which instructions can produce an 

fp exception other exception (with FSR.ftt = unimplemented FPop). 


Implementation-Dependent Instructions 


The SPARC V9 architecture provided two instruction spaces that are entirely 
implementation dependent: IMPDEP1 and IMPDEP2. 


In the UltraSPARC Architecture, the IMPDEP1 opcode space is used by VIS 
instructions. 


In the UltraSPARC Architecture, IMPDEP2 is subdivided into IMPDEP2A and 
IMPDEP2B. IMPDEP2A remains implementation dependent. The IMPDEP2B opcode 
space is reserved for implementation of floating-point multiply-add/multiply- 
subtract instructions. 


Reserved Opcodes and Instruction Fields 


If a conforming UltraSPARC Architecture 2005 implementation attempts to execute 
an instruction bit pattern that is not specifically defined in this specification, it 
behaves as follows: 


m If the instruction bit pattern encodes an implementation-specific extension to the 
instruction set, that extension is executed. 
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m If the instruction bit pattern does not encode an extension to the instruction set, 
but would decode as a valid instruction if nonzero bits in reserved instruction 
field(s) were ignored (read as 0): 


» The recommended behavior is to generate an illegal instruction exception (or, 
for FPop, an fp exception other exception with FSR.ftt = 3 
(unimplemented  FPop)). 


» Alternatively, the implementation can ignore the nonzero reserved field bits 
and execute the instruction as if those bits had been zero. 


m If the instruction bit pattern does not encode an extension to the instruction set 
and would still not decode as a valid instruction if nonzero bits in reserved 
instruction field(s) were ignored, then the instruction bit pattern is invalid and 
causes an exception. Specifically, attempting to execute an FPop instruction (see 
Floating-Point Operate on page 32) causes an fp. exception other exception (with 
FSR.ftt = unimplemented  FPop); attempting to execute any other invalid 
instruction bit pattern causes an illegal_ instruction exception. 


Forward | To further enhance backward (and forward) binary 
Compatibility | compatibility, the next revision of the UltraSPARC Architecture 
Note | is expected to require an illegal instruction exception to be 
generated by any instruction bit pattern that encodes neither a 
known UItraSPARC Architecture instruction nor an 
implementation-specific extension instruction (including those 
with nonzero bits in reserved instruction fields). 





See Appendix A, Opcode Maps, for an enumeration of the reserved instruction bit 
patterns (opcodes). 


Implementation | As described above, implementations are strongly encouraged, 
Note | but not strictly required, to trap on nonzero values in reserved 
instruction fields. 


Programming | For software portability, software (such as assemblers, static 

Note | compilers, and dynamic compilers) that generates SPARC 
instructions must always generate zeroes in instruction fields 
marked "reserved" ("—"). 
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CHAPTER T 


Instructions 





UltraSPARC Architecture 2005 extends the standard SPARC V9 instruction set with 
additional classes of instructions: 


m Enhanced functionality: 

Instructions for alignment (Alien Address on page 149) 

Array handling (Three-Dimensional Array Addressing on page 152) 

Byte-permutation instructions (Byte Mask and Shuffle on page 158) 

Edge handling (Edge Handling Instructions on pages 170 and 172) 

Logical operations on floating-point registers (F Register Logical Operate (1 

operand) on page 226) 

a Partitioned arithmetic (Fixed-point Partitioned Add on page 218 andFixed-point 
Partitioned Subtract (64-bit) on page 223) 

» Pixel manipulation (FEXPAND on page 186, FPACK on page 212, and 
FPMERGE on page 221) 

a Access to hyperprivileged state (such asRDHPR and WRHPR instructions) 


m Efficient memory access 


= Partial store (Store Partial Floating-Point on page 347) 
a Short floating-point loads and stores (Store Short Floating-Point on page 350) 
» Block load and store (Block Load on page 247 and Block Store on page 335) 


m Efficient interval arithmetic: SIAM (Set Interval Arithmetic Mode on page 325) and 
all instructions that reference GSR.im 


TABLE 7-2 provides a quick index of instructions, alphabetically by architectural 
instruction name. 


TABLE 7-3 summarizes the instruction set, listed within functional categories. 
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Within these tables and throughout the rest of this chapter, and in Appendix A, 
Opcode Maps, certain opcodes are marked with mnemonic superscripts. The 
superscripts and their meanings are defined in TABLE 7-1. 


TABLE7-1 Instruction Superscripts 





Superscript Meaning 

D Deprecated instruction (do not use in new software) 

H Hyperprivileged instruction 

N Nonportable instruction 

P Privileged instruction 

Past Privileged action if bit 7 of the referenced ASI is 0 

PAsR Privileged instruction if the referenced ASR register is privileged 

Pape Privileged action if in nonprivileged mode (PSTATE.priv = 0 and 
HPSTATE.hpriv = 0) and nonprivileged access is disabled 
((S)TICK.npt = 1) 

Prpic Privileged action if PCR.priv = 1 
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TABLE 7-2 

Page Instruction 

148 ADD (ADDcc) 

148  ADDC (ADDCcc) 

149  ALIGNADDRESS[_ LITTLE] 
150 ALLCLEAN 

151 AND (ANDcc) 

152 ARRAY<8116132> 
156 Bicc 

158 BMASK 

159  BPcc 

162  BPr 

158 | BSHUFFLE 

164 CALL 

165 CASAPAs! 

165 CASXAPAs! 

168 DONE? 

170 EDGE<8116132>[L]cc 
172 EDGE<8116132>[LIN 
233  F«sldlq»TO«sldlq» 
231  F«sldlq»TOi 

231  F«sldlq»TOx 

173 FABS«sldlq» 

174  FADD«sldlq» 

175  FALIGNDATA 

229 FANDNOT<112>[s] 
229  FAND[s] 

176  FBfccP 

178  FBPfcc 

183 FCMPe<sldiq> 

180  FCMP*«16,32» 

183  FCMPE«slIdlq» 

185 FDIV<sldliq> 

209  FdMULq 

186 FEXPAND 

187 FiTO<sidiq> 

188 FLUSH 

192 FLUSHW 

193  FMOVssldlq» 


195 
200 
209 
203 
203 
203 
203 
229 
211 
229 
227 
226 
229 
229 
212 
218 
221 
223 
209 
230 
227 
235 
229 
229 
236 
226 
237 
238 
238 
240 
241 
247 
251 
254 
251 
254 
258 


MOV«sldlq»cc 
MOV«slIdlq»R 
MUL«slIdlq» 
MULS[SU | UL]x16 





NOT<1 I 2>[s] 
FONE[s] 
FORNOT<1 | 2>[s] 
FOR[s] 
FPACK<16 | 32 | FIX> 
FPADD<16,32>[S] 
FPMERGE 
FPSUB<16,32>[S] 
FsMULd 
FSQRT<s | d1q> 
FSRC<1 | 2>[s] 
FSUB<s|d1q> 
FXNORIs] 
FXORIs] 
FxTO«sldlq» 
FZERO[s] 
ILLTRAP 
IMPDEP2A 
IMPDEP2B 
INVALW 

JMPL 
LDBLOCKF 
LDDF 

LDDFAP^s 

LDF 

LDFAP^s 
LDFSRP 


251 
254 
242 
244 
242 
244 
260 
262 
263 
242 
244 
270 
265 
267 
262 
244 
242 
244 
242 
244 
242 
244 
273 
275 
279 
283 
285 
287 
288 
289 
290 
290 
291 
292 
293 
295 
295 
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LDQF 
LDQFAPas 
LDSB 
LDSBAP^s 
LDSH 
LDSHAP^s 
LDSHORTF 
LDSTUB 
LDSTUBAP^s 
LDSW 
LDSWAPas 
LDTXAN 
LDTWP 
LDTWAP: Past 
LDUB 
LDUBAP^s 
LDUH 
LDUHAP^s 
LDUW 
LDUWAP^s 
LDX 

LDXAP^s 
LDXFSR 
MEMBAR 
MOVcc 

MOVr 
MULScc? 
MULX 

NOP 
NORMALW 

OR (ORcc) 

ORN (ORNcc) 
OTHERW 
PDIST 

POPC 
PREFETCH 
PREFETCHAP^s 
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TABLE 7-2 UltraSPARC Architecture 2005 Instruction Set - Alphabetical (2 of 2) 


Page Instruction 





303 RDAS 339 SIDF ~~~ 380 WRPR  .  .  . 
303  RDasrP^sk 341 STDFAP^s 376  WRSOFTINT. CLRP 
303. RDCCR 339 STF 8376 WRSOFTINT_SET? 
303 RDFPRS 341  STFAP^« 876 WRSOFTINT? 
303 RDGSR 345  STFSRP 876  WRSTICK. CMPR? 
306  RDHPRE 331 STH 376 WRSTICK? 
303 RDPC 332 STHAP^w 376 WRTICK_CMPR? 
303 RDPCRP 347  STPARTIALEF 376 WRYP 
303  RDPICP"c 339 STQF 385 XNOR (XNORcc) 
307 RDPRP 341  STOFAP^s 385 XOR (XORcc) 
303 RDSOFTINT? 350 STSHORTF 
303 RDSTICK_CMPR? 352 STTWP 
303 RDSTICK?rrt 354 STTWADP Pasi 
303 RDTICK CMPRP? 331 STW 
303  RDTICKPr! 332 STWAP^s 
311 RESTORED? 331 STX 
309 RESTORE? 332 STXAP^s 
313 RETRY? 357 STXFSR 
315 RETURN 859 SUB (SUBcc) 
319 SAVED? 359  SUBC (SUBCcc) 
317 SAVE? 361 SWAPAL: Pas 
321 SDIVP (sDIVcc) 360 SWAP? 
287 SDIVX 363 TADDcc 
323 SETHI 364 TADDecTV? 
324 SHUTDOWNP? 366 Tec 
325 SIAM 369  TSUBcc 
326 SIRE 370  TSUBccTVP 
327 SLL 372 UDIVP (UDIVcc?) 
327 SLLX 287 UDIVX 
329. SMULP (SMULccP) 374 UMULP (UMULccP) 
327 SRA 876  WRASI 
327 SRAX 376  WRasrP^s 
327 SRL 876 WRCCR 
327  SRLX 376  WRFPRS 
331 STB 376  WRGSR 
332 STBAPA 380 WRHPRE 
376  WRPCRP 
335  STBLOCKF 8376 WRPICPric 
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TABLE 7-3 Instruction Set - by Functional Category (1 of 6) 




















Ext. to 

Instruction Category and Function Page v9? 
Data Movement Operations, Between R Registers 
MOVcc Move integer register if condition is satisfied 279 
MOVr Move integer register on contents of integer register 283 
Data Movement Operations, Between F Registers 
FMOV«sldlq» Floating-point move 193 
FMOV<s |d | q>cc Move floating-point register if condition is satisfied 195 
FMOV«slIdlq»R Move f-p reg. if integer reg. contents satisfy condition 200 
FSRC«112»[s] Copy source 227 VIS 1 
Data Conversion Instructions 
FiTO«sldlq» Convert 32-bit integer to floating-point 187 
F«sldlq»TOi Convert floating point to integer 231 
F«sldlq»TOx Convert floating point to 64-bit integer 231 
F«sldlgq»TO«sldlq» Convert between floating-point formats 233 
FxTO«sldlq» Convert 64-bit integer to floating-point 236 
Logical Operations on R Registers 
AND (ANDcc) Logical and (and modify condition codes) 151 
OR (ORcc) Inclusive-or (and modify condition codes) 290 
ORN (ORNcc) Inclusive-or not (and modify condition codes) 290 
XNOR (XNORcc) Exclusive-nor (and modify condition codes) 385 
XOR (XORcc) Exclusive-or (and modify condition codes) 385 
Logical Operations on F Registers 
FAND[s] Logical and operation 229 VIS 1 
FANDNOT<1 | 2>[s] Logical and operation with one inverted source 229 VIS 1 
FNANDJs] Logical nand operation 229 VIS 1 
FNOR[s] Logical nor operation 229 VIS 1 
FNOT«112»[s] Copy negated source 227 VIS 1 
FONE[s] One fill 226 VIS 1 
FOR[s] Logical or operation 229 VIS 1 
FORNOT«1 I2»[s] Logical or operation with one inverted source 229 VIS 1 
FXNOR[s] Logical xnor operation 229 VIS 1 
FXOR[s] Logical xor operation 229 VIS 1 
FZERO[s] Zero fill 226 VIS 1 
Shift Operations on R Registers 

SLL Shift left logical 327 
SLLX Shift left logical, extended 327 
SRA Shift right arithmetic 327 
SRAX Shift right arithmetic, extended 327 
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TABLE 7-3 Instruction Set - by Functional Category (2 of 6) 











Ext. to 
Instruction Category and Function Page v9? 
SRL Shift right logical 327 
SRLX Shift right logical, extended 327 

Special Addressing Operations 
ALIGNADDRESS[ LITTLE] Calculate address for misaligned data 149 VIS 1 
ARRAY<8 | 16 | 32> 3-D array addressing instructions 152 VIS 1 
FALIGNDATA Perform data alignment for misaligned data 175 VIS 1 
Control Transfers 
Bicc Branch on integer condition codes 156 
BPcc Branch on integer condition codes with prediction 159 
BPr Branch on contents of integer register with prediction 162 
CALL Call and link 164 
DONE? Return from trap 168 
FBfccP Branch on floating-point condition codes 176 
FBPfcc Branch on floating-point condition codes with prediction 178 
ILLTRAP Illegal instruction 237 
JMPL Jump and link 241 
RETRYP Return from trap and retry 313 
RETURN Return 315 
SIR" Software-initiated reset 326 
Tec Trap on integer condition codes 366 
Byte Permutation 
BMASK Set the GSR.mask field 158 VIS 2 
BSHUFFLE Permute bytes as specified by GSR.mask 158 VIS 2 
Data Formatting Operations on F Registers 
FEXPAND Pixel expansion 186 VIS 1 
FPACK<16 |32 FIX> Pixel packing 212 VIS 1 
FPMERGE Pixel merge 221 VIS 1 
Memory Operations to/from F Registers 

LDBLOCKF Block loads 247 VIS 1 
STBLOCKF Block stores 335 VIS 1 
LDDF Load double floating-point 251 
LDDFAPast Load double floating-point from alternate space 254 
LDF Load floating-point 251 
LDFAPasi Load floating-point from alternate space 254 
LDQF Load quad floating-point 251 
LDQFAPAsI Load quad floating-point from alternate space 254 
LDSHORTF Short floating-point loads 260 VIS 1 
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TABLE 7-3 Instruction Set - by Functional Category (3 of 6) 














Ext. to 
Instruction Category and Function Page v9? 
STDF Store double floating-point 339 
STDFAP^s Store double floating-point into alternate space 341 
STF Store floating-point 339 
STFAP^s Store floating-point into alternate space 341 
STPARTIALF Partial Store instructions 347 VIS 1 
STOF Store quad floating point 339 
STQFAL asi Store quad floating-point into alternate space 341 

q &P P 
STSHORTF Short floating-point stores 350 VIS 1 
Memory Operations — Miscellaneous 
LDFSRP Load floating-point state register (lower) 258 
LDXFSR Load floating-point state register 273 
MEMBAR Memory barrier 275 
PREFETCH Prefetch data 295 
PREFETCHAP^s Prefetch data from alternate space 295 
STFSRP Store floating-point state register (lower) 345 
STXFSR Store floating-point state register 357 
Atomic (Load-Store) Memory Operations to/from R Registers 
CASAT% Compare and swap word in alternate space 165 
CASXAPas! Compare and swap doubleword in alternate space 165 
LDSTUB Load-store unsigned byte 262 
LDSTUBAPASI Load-store unsigned byte in alternate space 263 
SWAPP Swap integer register with memory 360 
SWAPAP' Pası Swap integer register with memory in alternate space 361 
Memory Operations to/from R Registers 

LDSB Load signed byte 242 
LDSBAPast Load signed byte from alternate space 244 
LDSH Load signed halfword 242 
LDSHAPast Load signed halfword from alternate space 244 
LDSW Load signed word 242 
LDSWAP^s Load signed word from alternate space 244 
LDTXAN Load integer twin extended word from alternate space 270 VIS 2+ 
LDTWP. Pas Load integer twin word 265 
LDTWAP Ps Load integer twin word from alternate space 267 
LDUB Load unsigned byte 262 
LDUBAPAS Load unsigned byte from alternate space 244 
LDUH Load unsigned halfword 242 
LDUHAPAS Load unsigned halfword from alternate space 244 
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TABLE 7-3 Instruction Set - by Functional Category (4 of 6) 
Ext. to 

Instruction Category and Function Page v9? 
LDUW Load unsigned word 242 
LDUWAP^s Load unsigned word from alternate space 244 
LDX Load extended 242 
LDXAPss Load extended from alternate space 244 
STB Store byte 331 
STBAP»s Store byte into alternate space 332 
STTWP Store twin word 352 
STTWAP Ps Store twin word into alternate space 354 
STH Store halfword 331 
STHAP»s Store halfword into alternate space 332 
STW Store word 331 
STWAP^s Store word into alternate space 332 
STX Store extended 331 
STXAPas! Store extended into alternate space 332 

Floating-Point Arithmetic Operations 
FABS<s | d1q> Floating-point absolute value 173 
FADD«sldlq» Floating-point add 174 
FDIV«sldlq» Floating-point divide 185 
FdMULq Floating-point multiply double to quad 209 
FMUL«sIdlq» Floating-point multiply 209 
FNEG«sldlq» Floating-point negate 211 
FsMULd Floating-point multiply single to double 209 
FSORT«sIdlq» Floating-point square root 230 
FSUB«slIdlq» Floating-point subtract 235 

Floating-Point Comparison Operations 
FCMP*<16,32> Compare four 16-bit signed values or two 32-bit signed values 180 VIS 1 
FCMPs<sldiq> Floating-point compare 183 
FCMPE«sdlq» Floating-point compare (exception if unordered) 183 

Register-Window Control Operations 
ALLCLEAN Mark all register window sets as "clean" 150 
INVALW Mark all register window sets as “invalid” 240 
FLUSHW Flush register windows 192 
NORMALW "Other" register windows become "normal" register windows 289 
OTHERW “Normal” register windows become "other" register windows 291 
RESTORE" Restore caller's window 309 
RESTORED? Window has been restored 311 
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TABLE 7-3 Instruction Set - by Functional Category (5 of 6) 

Ext. to 
Instruction Category and Function Page V9? 
SAVE —— Savclerswindow 37 
SAVEDP Window has been saved 319 

Miscellaneous Operations 
FLUSH Flush instruction memory 188 
IMPDEP2A Implementation-dependent instructions 238 
IMPDEP2B Implementation-dependent instructions (reserved) 238 
NOP No operation 288 
SHUTDOWNP? Shut down the virtual processor 324 VIS 1 
Integer SIMD Operations on F Registers 
FPADD<16,32>[S] Fixed-point partitioned add 218 VIS 1 
FPSUB<16,32>[S] Fixed-point partitioned subtract 223 VIS 1 
Integer Arithmetic Operations on R Registers 
ADD (ADDcc) Add (and modify condition codes) 148 
ADDC (ADDCcc) Add with carry (and modify condition codes) 148 
MULSccP Multiply step (and modify condition codes) 285 
MULX Multiply 64-bit integers 287 
SDIVP (SDIVccP) 32-bit signed integer divide (and modify condition codes) 321 
SDIVX 64-bit signed integer divide 287 
SMULP (SMULcc?) Signed integer multiply (and modify condition codes) 329 
SUB (SUBcc) Subtract (and modify condition codes) 359 
SUBC (SUBCcc) Subtract with carry (and modify condition codes) 359 
TADDcc Tagged add and modify condition codes (trap on overflow) 363 
TADDccTVP Tagged add and modify condition codes (trap on overflow) 364 
TSUBcc Tagged subtract and modify condition codes (trap on overflow) 369 
TSUBccTVP Tagged subtract and modify condition codes (trap on overflow) 370 
UDIVP (UDIVcc?) Unsigned integer divide (and modify condition codes) 372 
UDIVX 64-bit unsigned integer divide 287 
UMULP (UMULcc?) Unsigned integer multiply (and modify condition codes) 374 
Integer Arithmetic Operations on F Registers 
FMUL8x16 8x16 partitioned product 203 VIS 1 
FMULS8x16[AU | AL] 8x16 upper/lower a partitioned product 203 VIS 1 
FMULS[SU | UL]x16 8x16 upper/lower partitioned product 203 VIS 1 
FMULDS[SU | UL]x16 8x16 upper/lower partitioned product 203 VIS 1 
Miscellaneous Operations on R Registers 

POPC Population count 293 
SETHI Set high 22 bits of low word of integer register 323 


Miscellaneous Operations on F Registers 
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TABLE 7-3 Instruction Set - by Functional Category (6 of 6) 

Ext. to 
Instruction Category and Function Page v9? 
EDGE<8116132>[L]cc Edge handling instructions (and modify condition codes) 170 VIS 1 
EDGE<8 | 16132>[L]N Edge handling instructions 172 VIS 2 
PDIST Pixel component distance 292 VIS 1 

Control and Status Register Access 
RDASI Read ASI register 303 
RDasr? ask Read ancillary state register 303 
RDCCR Read Condition Codes register (CCR) 303 
RDFPRS Read Floating-Point Registers State register (FPRS) 303 
RDGSR Read General Status register (GSR) 303 
RDPC Read Program Counter register (PC) 303 
RDPCR? Read Performance Control register (PCR) 303 
RDPICPPE Read Performance Instrumentation Counters register (PIC) 303 
RDHPR# Read hyperprivileged register 306 
RDPR? Read privileged register 307 
RDSOFTINT? Read per-virtual processor Soft Interrupt register (SOFTINT) 303 
RDSTICK?*r' Read System Tick register (STICK) 303 
RDSTICK_CMPR? Read System Tick Compare register (STICK CMPR) 303 
RDTICKP rt Read Tick register (TICK) 303 
RDTICK_CMPR? Read Tick Compare register (TICK_CMPR) 303 
SIAM Set interval arithmetic mode 325 VIS 2 
WRASI Write ASI register 376 
WRasrPaSR Write ancillary state register 376 
WRCCR Write Condition Codes register (CCR) 376 
WRFPRS Write Floating-Point Registers State register (FPRS) 376 
WRGSR Write General Status register (GSR) 376 
WRPCR? Write Performance Control register (PCR) 376 
WRPICPnc Write Performance Instrumentation Counters register (PIC) 376 
WRHPRH Write hyperprivileged register 380 
WRPR? Write privileged register 380 
WRSOFTINT? Write per-virtual processor Soft Interrupt register (SOFTINT) 376 
WRSOFTINT_CLRP Clear bits of per-virtual processor Soft Interrupt register 376 
(SOFTINT) 

WRSOFTINT SET" Set bits of per-virtual processor Soft Interrupt register (SOFTINT) 376 
WRTICK_CMPR? Write Tick Compare register (TICK_CMPR) 376 
WRSTICK? Write System Tick register (STICK) 376 
WRSTICK_CMPR? Write System Tick Compare register (STICK_CMPR) 376 
WRYP Write Y register 376 
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In the remainder of this chapter, related instructions are grouped into subsections. 
Each subsection consists of the following sets of information: 


(1) Instruction Table. This lists the instructions that are defined in the subsection, 
including the values of the field(s) that uniquely identify the instruction(s), assembly 
language syntax, and software and implementation classifications for the 
instructions. (description of the Software Classes [letters] and Implementation Classes 
[digits] will be provided in a later update to this specification) 


Note | Instruction classes will be defined in a later draft of this document 
and in the meantime are subject to change. 


(2) Illustration of Instruction Format(s). These illustrations show how the 
instruction is encoded in a 32-bit word in memory. In them, a dash (—) indicates 
that the field is reserved for future versions of the architecture and must be 0 in any 
instance of the instruction. If a conforming UltraSPARC Architecture 
implementation encounters nonzero values in these fields, its behavior is as defined 
in Reserved Opcodes and Instruction Fields on page 134. 


(3) Description. This subsection describes the operation of the instruction, its 
features, restrictions, and exception-causing conditions. 


(4) Exceptions. The exceptions that can occur as a consequence of attempting to 
execute the instruction(s). Exceptions due to an instruction access exception, 

fast instruction access MMU miss, WDR, and interrupts are not listed because they 
can occur on any instruction. An FPop that is not implemented in hardware 
generates an fp. exception other exception with FSR fit = unimplemented FPop 
when executed. A non-FPop instruction not implemented in hardware generates an 
illegal instruction exception and therefore will not generate any of the other 
exceptions listed. Exceptions are listed in order of trap priority (see Trap Priorities on 
page 485), from highest to lowest priority. 


(5) See Also. A list of related instructions (on selected pages). 


Note | This specification does not contain any timing information (in 
either cycles or elapsed time), since timing is always 
implementation dependent. 
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ADD 





7.1 Add 


Instruction  op3 Operation Assembly Language Syntax Class 
ADD 00 0000 Add add VESrstr leg or imm, Tegra A 
ADDcc 01 0000 Add and modify cc's addcc  regyg, reg or imm, regag A1 
ADDC 00 1000 Add with 32-bit Carry addc reg,s,, leg or imm, regag A1 
ADDCcc 01 1000 Add with 32-bit Carry and modify cc's addccc reg;g;, reg or imm, reg; A1 


Dx NN mmis 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description If i=0, ADD and ADDcc compute "R[rs1] + R[rs2]". If i = 1, they compute 
^R[rs1] + sign ext (simm13)". In either case, the sum is written to R[rd]. 


ADDC and ADDCcc ("ADD with carry") also add the CCR register's 32-bit carry 
(icc.c) bit. That is, if i = 0, they compute "R[rs1] + R[rs2] + icc.c" and if i = 1, they 
compute "R[rs1] + sign ext (simm13) + icc.c". In either case, the sum is written to 
R[rd]. 


ADDcc and ADDCcc modify the integer condition codes (CCR.icc and CCR.xcc). 
Overflow occurs on addition if both operands have the same sign and the sign of the 
sum is different from that of the operands. 


Programming | ADDC and ADDCcc read the 32-bit condition codes’ carry bit 
Note | (CCR.icc.c), not the 64-bit condition codes’ carry bit (CCR.xcc.c). 


SPARC V8 
Compatibility 
Note 


ADDC and ADDCcc were previously named ADDX and 
ADDxXcc, respectively, in SPARC V8. 





An attempt to execute an ADD, ADDcc, ADDC or ADDCcc instruction when i = 0 
and reserved instruction bits 12:5 are nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 
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ALIGNADDRESS 





7.2 Align Address 


Instruction opf Operation Assembly Language Syntax Class 
ALIGNADDRESS 000011000 Calculate address for misaligned alignaddr regrsir Tegrs2r Te8rd A1 
data access 


ALIGNADDRESS_ 0 00011010 Calculate address for misaligned alignaddrl regrsır regrs2, Te8rd A1 
LITTLE data access little-endian 





3 Ton C EN NN: 


0 O 4 Ô 8 4 4 0 


Description ALIGNADDRESS adds two integer values, R[rs1] and R[rs2], and stores the result 
(with the least significant 3 bits forced to 0) in the integer register R[rd]. The least 
significant 3 bits of the result are stored in the GSR.align field. 


ALIGNADDRESS LITTLE is the same as ALIGNADDRESS except that the two's 
complement of the least significant 3 bits of the result is stored in GSR.align. 


Note | ALIGNADDRESS LITTLE generates the opposite-endian byte 
ordering for a subsequent FALIGNDATA operation. 


A byte-aligned 64-bit load can be performed as shown below. 





alignaddr Address, Offset, Address !set GSR.align 


lad [Address], %a0 
ldd [Address + 8], %d2 
faligndata %d0, %d2, %d4 luse GSR.align to select bytes 





If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no 
FPU is present, an attempt to execute an ALIGNADDRESS or 
ALIGNADDRESS_LITTLE instruction causes an fp_disabled exception. 


Exceptions fo_disabled 


See Also Align Data on page 175 
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ALLCLEAN 





7.3 Mark All Register Window Sets “Clean” 





Instruction Operation Assembly Language Syntax Class 

ALLCLEAN? Mark all register window sets as “clean” allclean A1 
fon=ooo10 Toon 
31 30 29 25 24 19 18 0 


Description The ALLCLEAN instruction marks all register window sets as “clean”; specifically, it 
performs the following operation: 


CLEANWIN < (N_REG_WINDOWS - 1) 


Programming | ALLCLEAN is used to indicate that all register windows are 

Note | “clean”; that is, do not contain data belonging to other address 
spaces. It is needed because the value of N_REG_WINDOWS is not 
known to privileged software. 


This instruction allows window manipulations to be atomic, 
without the value of N_REG_WINDOWS being visible to privileged 
software and without an assumption that N_REG_WINDOWS is 
constant (since hyperprivileged software can migrate a thread 
among virtual processors, across which N_REG_WINDOWS may 
vary). 





Exceptions illegal instruction (not implemented in hardware in UltraSPARC Architecture 2005) 
privileged_opcode 


See Also INVALW on page 240 
NORMALW on page 289 
OTHERW on page 291 
RESTORED on page 311 
SAVED on page 319 


150 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


AND, ANDN 





7.4 


Instruction 
AND 
ANDcc 
ANDN 
ANDNcc 


AND Logical Operation 


op3 Operation Assembly Language Syntax Class 
00 0001 and and Tégrg1, leg Or imm, Tera A1 
01 0001 and and modify cc's andcc  Teg;g,, reg or imm, Tegra A1 
00 0101 and not andn Tégrg1, leg Or imm, Tegyq A1 
01 0101 and not and modify cc's andncc Fegysir reg or imm, re&rg A1 


mI [GI 8H ———I—s 
= sian 


31 30 29 


25 24 19 18 14 13 12 5 4 0 


Description These instructions implement bitwise logical and operations. They compute “R[rs1] 


Exceptions 


op R[rs2]" if i= 0, or "R[rs1] op sign. ext (simm13)" if i = 1, and write the result into 
R[rd]. 


ANDcc and ANDNcc modify the integer condition codes (icc and xcc). They set the 
condition codes as follows: 


icc.v, icc.c, XCC.V, and xcc.c are set to 0 

icc.n is copied from bit 31 of the result 

xcc.n is copied from bit 63 of the result 

icc.z is set to 1 if bits 31:0 of the result are zero (otherwise to 0) 
xcc.z is set to 1 if all 64 bits of the result are zero (otherwise to 0) 


ANDN and ANDNcc logically negate their second operand before applying the 
main (and) operation. 


An attempt to execute an AND, ANDcc, ANDN or ANDNcc instruction when i = 0 
and reserved instruction bits 12:5 are nonzero causes an illegal_instruction exception. 


illegal instruction 


CHAPTER 7 * Instructions 151 


ARRAY<8|16|32> 





7.5 Three-Dimensional Array Addressing 
[VIS 1] 








Instruction opf Operation Assembly Language Syntax Class 
ARRAYS 000010000 Convert 8-bit 3D address to blocked byte address array8  reg;g,, leSrsor Tegra C3 
ARRAY16 000010010 Convert 16-bit 3D address to blocked byte address array16 reg;g;, Tegrs2r regra C3 
ARRAY32 000010100 Convert 32-bit 3D address to blocked byte address array32 reg;g,, reg;go, Tegra C3 


mono] st) I8 


31 30 29 25 24 19 18 14 13 5 4 0 


Description These instructions convert three-dimensional (3D) fixed-point addresses contained 
in R[rs1] to a blocked-byte address; they store the result in R[rd]. Fixed-point 
addresses typically are used for address interpolation for planar reformatting 
operations. Blocking is performed at the 64-byte level to maximize external cache 
block reuse, and at the 64-Kbyte level to maximize TLB entry reuse, regardless of the 
orientation of the address interpolation. These instructions specify an element size of 
8 bits (ARRAYS), 16 bits (ARRAY16), or 32 bits (ARRAY32). 


The second operand, R[rs2], specifies the power-of-2 size of the X and Y dimensions 
of a 3D image array. The legal values for R[rs2] and their meanings are shown in 
TABLE 7-4. Illegal values produce undefined results in the destination register, R[rd]. 


TABLE 7-4 3D R[rs2] Array X and Y Dimensions 
"R[rs2] Value (n) Number of Elements 
0 64 

128 

256 

512 

1024 

2048 


O1 & WN — 





Implementation | Architecturally, an illegal R[rs2] value (>5) causes the array 
Note | instructions to produce undefined results. For historic reference, 
past implementations of these instructions have ignored 
R[rs2]{63:3} and have treated R[rs2] values of 6 and 7 as if they 
were 5. 


The array instructions facilitate 3D texture mapping and volume rendering by 
computing a memory address for data lookup based on fixed-point x, y, and z 
coordinates. The data are laid out in a blocked fashion, so that points which are near 
one another have their data stored in nearby memory locations. 
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ARRAY<8|16|32> 


If the texture data were laid out in the obvious fashion (the z = 0 plane, followed by 
the z = 1 plane, etc.), then even small changes in z would result in references to 
distant pages in memory. The resulting lack of locality would tend to result in TLB 
misses and poor performance. The three versions of the array instruction, ARRAY8, 
ARRAY16, and ARRAY32, differ only in the scaling of the computed memory offsets. 
ARRAY16 shifts its result left by one position and ARRAY32 shifts left by two in 
order to handle 16- and 32-bit texture data. 


When using the array instructions, a “blocked-byte” data formatting structure is 
imposed. The N x NX M volume, where N = 2” x 64, M = m x 32,0 <n <5ć,1<m<16 
should be composed of 64 x 64 x 32 smaller volumes, which in turn should be 
composed of 4 x 4 x 2 volumes. This data structure is optimal for 16-bit data. For 16- 
bit data, the 4 x 4 x 2 volume has 64 bytes of data, which is ideal for reducing cache- 
line misses; the 64 x 64 x 32 volume will have 256 Kbytes of data, which is good for 
improving the TLB hit rate. FIGURE 7-1 illustrates how the data has to be organized, 
where the origin (0,0,0) is assumed to be at the lower-left front corner and the x 
coordinate varies faster than y than z. That is, when traversing the volume from the 
origin to the upper right back, you go from left to right, front to back, bottom to top. 





16x 2=32 
16x 4= 64 


0 4 16x 4-64 N = 2! x 64 


FIGURE 7-1 Blocked-Byte Data Formatting Structure 


The array instructions have 2 inputs: 
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ARRAY-«8|16|32» 


The (x,y,z) coordinates are input via a single 64-bit integer organized in R[rs1] as 
shown in FIGURE 7-2. 








Z integer | Z fraction Y integer Y fraction| X integer X fraction 
63 55 54 44 43 33 32 22 21 11 10 0 

















FIGURE 7-2 Three-Dimensional Array Fixed-Point Address Format 


Note that z has only 9 integer bits, as opposed to 11 for x and y. Also note that since 
(x,y,z) are all contained in one 64-bit register, they can be incremented or 
decremented simultaneously with a single add or subtract instruction (ADD or 
SUB). 


So for a 512 x 512 x 32 or a 512 x 512 x 256 volume, the size value is 3. Note that the 
x and y size of the volume must be the same. The z size of the volume is a multiple 
of 32, ranging between 32 and 512. 


The array instructions generate an integer memory offset, that when added to the 
base address of the volume, gives the address of the volume element (voxel) and can 
be used by a load instruction. The offset is correct only if the data has been 
reformatted as specified above. 


The integer parts of x, y, and z are converted to the following blocked-address 
formats as shown in FIGURE 7-3 for ARRAYS, FIGURE 7-4 for ARRAY16, and FIGURE 7-5 
for ARRAY32. 




















UPPER MIDDLE LOWER 
Z | Y | X Z | Y | X Z Y X 
20 17 17 17 13 9 5 4 2 0 
+2n +2n +n 


FIGURE 7-3 Three-Dimensional Array Blocked-Address Format (ARRAYS) 





UPPER MIDDLE LOWER 





























21 18 18 18 14 10 6 5 3 1 0 
+2n +2n +n 


FIGURE 7-4 Three-Dimensional Array Blocked-Address Format (ARRAY16) 
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ARRAY<8|16|32> 





















































UPPER MIDDLE LOWER 
00 
Z Y x Z | Y x Z Y X 
22 19 19 19 15 11 Y 6 5 4 3 21 0 
+2n +2n +n 
FIGURE 7-5 Three Dimensional Array Blocked-Address Format (ARRAY32) 
The bits above Z upper are set to 0. The number of zeroes in the least significant bits 
is determined by the element size. An element size of 8 bits has no zeroes, an 
element size of 16 bits has one zero, and an element size of 32 bits has two zeroes. 
Bits in X and Y above the size specified by R[rs2] are ignored. 
TABLE 7-5 ARRAYS Description 
Result (R[rd]) Bits Source (R[rs1] Bits Field Information 
1:0 12:11 X_integer{1:0} 
3:2 34:33 Y_integer{1:0} 
4 55 Z_integer{0} 
8:5 16:13 X_integer{5:2} 
12:9 38:35 Y_integer{5:2} 
16:13 59:56 Z_integer{4:1} 
17+n-1:17 17+n-1:17 X_integer{6+n-1:6} 
17+2n-1:17+n 39+n-1:39 Y_integer{6+n-1:6} 
20+2n:17+2n 63:60 Z_integer{8:5} 
63:20+2n+1 n/a 0 
In the above description, if n = 0, there are 64 elements, so X_integer{6} and 
Y integer[6] are not defined. That is, result{20:17} equals Z integer(8:5]. 
Note | To maximize reuse of external cache and TLB data, software 
should block array references of a large image to the 64-Kbyte 
level. This means processing elements within a 32 x 32 x 64 
block. 
The code fragment below shows assembly of components along an interpolated line 
at the rate of one component per clock. 
add Addr, DeltaAddr, Addr 
array8 Addr, %g0, bAddr 
ldda [bAddr] #ASI_FL8_ PRIMARY, data 
faligndata data, accum, accum 
Exceptions None 
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Bicc 





7.6 Branch on Integer Condition Codes 
(Bicc) 





Assembly Language 








Opcode cond Operation icc Test Syntax Class 
BA 1000 Branch Always 1 ba{,a} label Al 
BN 0000 Branch Never 0 bn{,a} label Al 
BNE 1001 Branch on Not Equal not Z bne't,a) label A1 
BE 0001 Branch on Equal Z bet{,a} label A1 
BG 1010 Branch on Greater not (Z or (N xor V)) bg{,a} label A1 
BLE 0010 Branch on Less or Equal Z or (N xor V) blet,a) label A1 
BGE 1011 Branch on Greater or Equal not (N xor V) bge{,a} label A1 
BL 0011 Branch on Less N xor V bl{,a} label A1 
BGU 1100 Branch on Greater Unsigned not (C or Z) bgu{,a} label A1 
BLEU 0100 Branch on Less or Equal Unsigned Cor Z bleut,a) label A1 
BCC 1101 Branch on Carry Clear (Greater Than not C bcc?(,a) label A1 
or Equal, Unsigned) 
BCS 0101 Branch on Carry Set (Less Than, Unsigned) C bcsV( ,a) label A1 
BPOS 1110 Branch on Positive not N bposí,a) label A1 
BNEG 0110 Branch on Negative N bneg{,a} label A1 
BVC 1111 Branch on Overflow Clear not V bve{,a} label A1 
BVS 0111 Branch on Overflow Set V bvs{,a} label A1 
$ synonym: bnz t synonym: bz 9 synonym: bgeu Y synonym: blu 
CETT e T T AE] 
31 30 29 28 25 24 22 21 0 


Programming | To set the annul (a) bit for Bicc instructions, append ^", a” to the 
Note | opcode mnemonic. For example, use "bgu,a label". In the 
preceding table, braces signify that the ", a” is optional. 


Unconditional branches and icc-conditional branches are described below: 


m Unconditional branches (BA, BN) — If its annul bit is 0 (a = 0), a BN (Branch 
Never) instruction is treated as a NOP. If its annul bit is 1 (a = 1), the following 
(delay) instruction is annulled (not executed). In neither case does a transfer of 
control take place. 
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Bicc 


BA (Branch Always) causes an unconditional PC-relative, delayed control transfer 
to the address “PC + (4 x sign ext (disp22) )". If the annul (a) bit of the branch 
instruction is 1, the delay instruction is annulled (not executed). If the annul bit is 
0 (a = 0), the delay instruction is executed. 


m icc-conditional branches — Conditional Bicc instructions (all except BA and BN) 
evaluate the 32-bit integer condition codes (icc), according to the cond field of the 
instruction, producing either a TRUE or FALSE result. If TRUE, the branch is taken, 
that is, the instruction causes a PC-relative, delayed control transfer to the 
address "PC + (4 x sign ext (disp22))". If FALSE, the branch is not taken. 




















If a conditional branch is taken, the delay instruction is always executed 
regardless of the value of the annul field. If a conditional branch is not taken and 
the annul bit is 1 (a = 1), the delay instruction is annulled (not executed). 


Note | The annul bit has a different effect on conditional branches than 
it does on unconditional branches. 


Annulment, delay instructions, and delayed control transfers are described further 
in Chapter 6, Instruction Set Overview. 


Exceptions None 
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BMASK / BSHUFFLE 





7.7 Byte Mask and Shuffle 


Instruction opf Operation Assembly Language Syntax Class 


BMASK 0 00011001 Set the GSR.mask field in preparation bmask TES rstr lSrs2r lESrd C3 
for a subsequent BSHUFFLE instruction 


BSHUFFLE 001001100 Permute 16 bytes as specified by GSR.mask bshuffle fregys7, fregrs2r freggg C3 





ee a Se 1 


31 30 29 25 24 19 18 14 18 5 4 0 


Description BMASK adds two integer registers, R[rs1] and R[rs2], and stores the result in the 
integer register R[rd]. The least significant 32 bits of the result are stored in the 
GSR.mask field. 


BSHUFFLE concatenates the two 64-bit floating-point registers Fp[rs1] (more 
significant half) and Fp[rs2] (less significant half) to form a 128-bit (16-byte) value. 
Bytes in the concatenated value are numbered from most significant to least 
significant, with the most significant byte being byte 0. BSHUFFLE extracts 8 of 
those 16 bytes and stores the result in the 64-bit floating-point register Fp[rd]. Bytes 
in Fp[rd] are also numbered from most to least significant, with the most significant 
being byte 0. The following table indicates which source byte is extracted from the 
concatenated value to generate each byte in the destination register, Fp[rd]. 





Destination Byte (in F[rd) ^ Source Byte 
0 (most significant)  (Fp[rs1] :: Fp[[rs2]){GSR.mask{31:28}} 
1 (Fp[[rs1] :: Fp[[rs2]){(GSR.mask{27:24}} 
2 (Fpl[rs1] :: Fp[[rs2]){GSR.mask{23:20}} 
3 [rs1] :: Fp[[rs2]){GSR.mask{19:16}} 
4 (Fpl[rs1] :: Fp[[rs2]){GSR.mask{15:12}} 
5 (Fpl[[rs1] :: Fp[[rs2]){GSR.mask{11:8}} 
6 (Fpl[rs1] :: Fp[[rs2]){GSR.mask{7:4}} 
{ 





7 (least significant)  (Fp[[rst] :: Fp[[rs2]){GSR.mask{3:0}} 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no 
FPU is present, an attempt to execute a BMASK or BSHUFFLE instruction causes an 
fp disabled exception. 


Exceptions fp disabled 
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BPcc 





7.8 Branch on Integer Condition Codes with 
Prediction (BPcc) 











Instructioncond Operation cc Test Assembly Language Syntax Class 
BPA 1000 Branch Always 1 ba{,a}{,ptl,pn} i or x cc, label A1 
BPN 0000 Branch Never 0 bn{,a}{,ptl,pn} ior x cc, label A1 
BPNE 1001 Branch on Not Equal not Z bnet(,alf, pt! , pn] i or x cc, label A1 
BPE 0001 Branch on Equal Z bef{,a}{,ptl,pn} i or x cc, label A1 
BPG 1010 Branch on Greater not (Z or bg{,a}{,ptl,pn} ior x cc, label A1 
(N xor V)) 
BPLE 0010 Branch on Less or Equal Zor(N xor V) ble{,a}{,pt!,pn} i or x cc, label A1 
BPGE 1011 Branch on Greater or Equal not (N xor V) bge{,a}{,ptl,pn} i or x cc, label — A1 
BPL 0011 Branch on Less N xor V blf,alb,ptl,pn) ior x cc, label A1 
BPGU 1100 Branch on Greater Unsigned not (C or Z) bgu{,a}{,ptl,pn} i or x cc, label A1 
BPLEU 0100 Branch on Less or Equal Unsigned C or Z bleu{,a}{,ptl,pn} i or x cc, label A1 
BPCC 1101 Branch on Carry Clear not C becO{, aH, ptl, pn] i or x cc, label A1 
(Greater than or Equal, Unsigned) 
BPCS 0101 Branch on Carry Set C besV{, a}{, pt |, pn} i_or_x_cc, label A1 
(Less than, Unsigned) 
BPPOS 1110 Branch on Positive not N bpos{, a}{, pt! , pn] i or x cc, label A1 
BPNEG 0110 Branch on Negative N bneg{, a}{, pt |, pn} i or x cc, label A1 
BPVC 1111 Branch on Overflow Clear not V bvc{,a}{,ptl,pn} i or x cc, label A1 
BPVS 0111 Branch on Overflow Set V bvs{,a}{,ptl,pn} i or x cc, label A1 
t synonym: bnz + synonym: bz 9 synonym: bgeu V synonym: blu 
Go Ta T cond | 007 2 [  — — — — — 99 — — ——] 
31 30 29 28 25 24 22 21 20 19 18 
cci cc0 Condition Code 
0 0 icc 
0 1 — 
1 0 XCC 
1 1 - 


CHAPTER 7 * Instructions 159 


BPcc 


Programming | To set the annul (a) bit for BPcc instructions, append ^, a” to the 
Note | opcode mnemonic. For example, use bgu, a %icc, label. Braces in 

the preceding table signify that the ", a” is optional. To set the 

branch prediction bit, append to an opcode mnemonic either 

"^, pt" for predict taken or ", pn" for predict not taken. If neither 

^, pt" nor ", pn" is specified, the assembler defaults to ",pt". To 


Mo 


select the appropriate integer condition code, include "$icc" or 


Wo 


%xcc” before the label. 





Description Unconditional branches and conditional branches are described below. 


m Unconditional branches (BPA, BPN) — A BPN (Branch Never with Prediction) 
instruction for this branch type (op2 = 1) may be used in the SPARC V9 
architecture as an instruction prefetch; that is, the effective address (PC + (4 x 
sign_ext (disp19))) specifies an address of an instruction that is expected to be 
executed soon. If the Branch Never’s annul bit is 1 (a = 1), then the following 
(delay) instruction is annulled (not executed). If the annul bit is 0 (a = 0), then the 
following instruction is executed. In no case does a Branch Never cause a transfer 
of control to take place. 


BPA (Branch Always with Prediction) causes an unconditional PC-relative, 
delayed control transfer to the address “PC + (4 x sign_ext (disp19))”. If the annul 
bit of the branch instruction is 1 (a = 1), then the delay instruction is annulled (not 
executed). If the annul bit is 0 (a = 0), then the delay instruction is executed. 


m Conditional branches — Conditional BPcc instructions (except BPA and BPN) 
evaluate one of the two integer condition codes (icc or xcc), as selected by ccO 
and cc1, according to the cond field of the instruction, producing either a TRUE or 
FALSE result. If TRUE, the branch is taken; that is, the instruction causes a PC- 
relative, delayed control transfer to the address "PC + (4 xsign ext (disp19))". If 
FALSE, the branch is not taken. 




















If a conditional branch is taken, the delay instruction is always executed 
regardless of the value of the annul (a) bit. If a conditional branch is not taken 
and the annul bit is 1 (a = 1), the delay instruction is annulled (not executed). 


Note | The annul bit has a different effect on conditional branches than 
it does on unconditional branches. 


The predict bit (p) is used to give the hardware a hint about whether the branch is 
expected to be taken. A 1 in the p bit indicates that the branch is expected to be 
taken; a 0 indicates that the branch is expected not to be taken. 


Annulment, delay instructions, prediction, and delayed control transfers are 
described further in Chapter 6, Instruction Set Overview. 


An attempt to execute a BPcc instruction with cc0 = 1 (a reserved value) causes an 
illegal instruction exception. 


Exceptions illegal instruction 
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BPcc 


See Also Branch on Integer Register with Prediction (BPr) on page 162 
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BPr 





7.9 Branch on Integer Register with 
Prediction (BPr) 





Register 
Contents 
Instruction rcond Operation Test Assembly Language Syntax Class 
— 000 Reserved = == 
BRZ 001 Branch on Register Zero Rfrs1]=0 brz{,al{,ptl,pn}  reg;ss, label A1 
BRLEZ 010 Branch on Register Less Than or Equal R[rs1] <0 brlez{,a}{,ptl,pn} regrsı; label A1 
to Zero 


BRLZ 011 Branch on Register Less Than Zero R[rs1]<0 briz{,a}{,ptl,pn} reg,ss, label A1 
— 100 Reserved — — 
BRNZ 101 Branch on Register Not Zero R[rs1] £0 brnz{,a}{,ptl,pn} reg;g,, label A1 
BRGZ 110 Branch on Register Greater Than Zero R[rsi] 0 brgz{,al{,ptl,pn} reg,ss, label — A1 


BRGEZ 111 Branch on Register Greater Than or R[rsí] 20 brgez{,a}{,ptl,pn} regrsı; label — A1 
Equal to Zero 





TT T on pe) Te — —] 


31 30 29 28 27 25 24 22 21 20 19 18 14 13 0 
" Although SPARC V9 implementations should cause an illegal instruction exception when bit 28 = 1, many 
early implementations ignored the value of this bit and executed the opcode as a BPr instruction even if 
bit 28 = 1. 
Programming | To set the annul (a) bit for BPr instructions, append “, a" to the 
Note | opcode mnemonic. For example, use "brz,a $513, label.” In the 
preceding table, braces signify that the ", a" is optional. To set the 
branch prediction bit p, append either “, pt" for predict taken or 
"^ , pn" for predict not taken to the opcode mnemonic. If neither 
^ , pt" nor ", pn" is specified, the assembler defaults to ", pt". 


Description ^ These instructions branch based on the contents of R[rs1]. They treat the register 
contents as a signed integer value. 


A BPr instruction examines all 64 bits of R[rs1] according to the rcond field of the 
instruction, producing either a TRUE or FALSE result. If TRUE, the branch is taken; 
that is, the instruction causes a PC-relative, delayed control transfer to the address 
“PC + (4 x sign ext (d16hi :: d16lo))". If FALSE, the branch is not taken. 




















If the branch is taken, the delay instruction is always executed, regardless of the 
value of the annul (a) bit. If the branch is not taken and the annul bit is 1 (a = 1), the 
delay instruction is annulled (not executed). 
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Exceptions 


See Also 


BPr 


The predict bit (p) gives the hardware a hint about whether the branch is expected to 
be taken. If p = 1, the branch is expected to be taken; p = 0 indicates that the branch 
is expected not to be taken. 


An attempt to execute a BPr instruction when instruction bit 28 = 1 or rcond is a 
reserved value (0005 or 1005) causes an illegal instruction exception. 


Annulment, delay instructions, prediction, and delayed control transfers are 
described further in Chapter 6, Instruction Set Overview. 


Implementation | If this instruction is implemented by tagging each register value 
Note | with an N (negative) bit and Z (zero) bit, the table below can be 
used to determine if rcond is TRUE: 





Branch Test 

BRNZ not Z 

BRZ Z 

BRGEZ not N 

BRLZ N 

BRLEZ NorZ 
BRGZ not (N or Z) 


illegal instruction 


Branch on Integer Condition Codes with Prediction (BPcc) on page 159 


CHAPTER 7 * Instructions 163 


CALL 





7.10 Call and Link 


Instruction op Operation Assembly Language Syntax Class 


CALL 01 Call and Link call label A1 





31 30 29 0 


Description The CALL instruction causes an unconditional, delayed, PC-relative control transfer 
to address PC + (4 x sign ext(disp30)). Since the word displacement (disp30) field is 
30 bits wide, the target address lies within a range of —2?! to 42?! — 4 bytes. The PC- 
relative displacement is formed by sign-extending the 30-bit word displacement field 
to 62 bits and appending two low-order zeroes to obtain a 64-bit byte displacement. 


The CALL instruction also writes the value of PC, which contains the address of the 
CALL, into R[15] (out register 7). 


When PSTATE.am - 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system and in the address 
written into R[15]. (closed impl. dep. #125-V9-Cs10) 


Exceptions None 


See Also JMPL on page 241 
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CASA / CASXA 





7.11 


Instruction 


CASAP^s 


CASXAP^s 


Compare and Swap 


op3 Operation Assembly Language Syntax Class 

111100 Compare and Swap Word from casa [regrs1] imm asi, reg,s2, rega — A1 
Alternate Space casa [regs] $asi, regrgo, ler 

111110 Compare and Swap Extended from casxa  [reg;g;] imm asi, regyso, rega A1 
Alternate Space casxa  [L'eSrsil SASL, regrgo, regyq 


aps S TE mma T 8 


31 30 29 


Description 


rd 


op3 rs1 i=l — rs2 
25 24 19 18 14 13 12 5 4 0 


Concurrent processes use these instructions for synchronization and memory 
updates. Uses of compare-and-swap include spin-lock operations, updates of shared 
counters, and updates of linked-list pointers. The last two can use wait-free 
(nonlocking) protocols. 


The CASXA instruction compares the value in register R[rs2] with the doubleword 
in memory pointed to by the doubleword address in R[rs1]. If the values are equal, 
the value in R[rd] is swapped with the doubleword pointed to by the doubleword 
address in R[rs1]. If the values are not equal, the contents of the doubleword 
pointed to by R[rs1] replaces the value in R[rd], but the memory location remains 
unchanged. 


The CASA instruction compares the low-order 32 bits of register R[rs2] with a word 
in memory pointed to by the word address in R[rs1]. If the values are equal, then the 
low-order 32 bits of register R[rd] are swapped with the contents of the memory 
word pointed to by the address in R[rs1] and the high-order 32 bits of register R[rd] 
are set to 0. If the values are not equal, the memory location remains unchanged, but 
the contents of the memory word pointed to by R[rs1] replace the low-order 32 bits 
of R[rd] and the high-order 32 bits of register R[rd] are set to 0. 


A compare-and-swap instruction comprises three operations: a load, a compare, and 
a swap. The overall instruction is atomic; that is, no intervening interrupts or 
deferred traps are recognized by the virtual processor and no intervening update 
resulting from a compare-and-swap, swap, load, load-store unsigned byte, or store 
instruction to the doubleword containing the addressed location, or any portion of it, 
is performed by the memory system. 
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CASA / CASXA 


A compare-and-swap operation does not imply any memory barrier semantics. 
When compare-and-swap is used for synchronization, the same consideration 
should be given to memory barriers as if a load, store, or swap instruction were 
used. 


A compare-and-swap operation behaves as if it performs a store, either of a new 
value from Rfrd] or of the previous value in memory. The addressed location must 
be writable, even if the values in memory and R[rs2] are not equal. 


If i = 0, the address space of the memory location is specified in the imm asi field; if 
i= 1, the address space is specified in the ASI register. 


An attempt to execute a CASXA or CASA instruction when i = 1 and instruction bits 
12:5 are nonzero causes an illegal instruction exception. 


A mem adaress not aligned exception is generated if the address in R[rs1] is not 
properly aligned. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0, CASXA and CASA cause a privileged action exception. In privileged mode 
(PSTATE.priv = 1 and HPSTATE.hpriv = 0), if the ASI is in the range 3046 to 7F16, 
CASXA and CASA cause a privileged action exception. 


Compatibility | An implementation might cause an exception because of an 
Note | error during the store memory access, even though there was no 
error during the load memory access. 


Programming | Compare and Swap (CAS) and Compare and Swap Extended 

Note | (CASX) synthetic instructions are available for “big endian” 
memory accesses. Compare and Swap Little (CASL) and Compare 
and Swap Extended Little (CASXL) synthetic instructions are 
available for "little endian" memory accesses. See Synthetic 
Instructions on page 536 for the syntax of these synthetic 
instructions. 





The compare-and-swap instructions do not affect the condition codes. 


The compare-and-swap instructions can be used with any of the following ASIs, 
subject to the privilege mode rules described for the privileged action exception 
above. Use of any other ASI with these instructions causes a data access exception 
exception. 





ASis valid for CASA and CASXA instructions 
ASI AS IF PRIV PRIMARY ASI AS IF PRIV PRIMARY LITTLE 
ASI AS IF PRIV SECONDARY ASI, AS IF PRIV SECONDARY LITTLE 


ASI NUCLEUS ASI NUCLEUS LITTLE 
ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
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CASA / CASXA 


ASls valid for CASA and CASXA instructions 








ASI, REAL ASI, REAL LITTLE 

ASI, PRIMARY ASI, PRIMARY LITTLE 

ASI, SECONDARY ASI, SECONDARY LITTLE 
Exceptions illegal instruction 


mem address not aligned 
privileged action 

VA watchpoint 

data access exception 

fast data access MMU miss 
data access MMU miss 
data access MMU error 
fast data access protection 





CHAPTER 7 * Instructions 167 


DONE 





7.12 DONE 





Instruction  op3 Operation Assembly Language Syntax Class 

DONE? 111110 Return from Trap (skip trapped instruction) done Al 
CEA | TO 
31 30 29 25 24 19 18 0 


Description The DONE instruction restores the saved state from TSTATE[TL] (GL, CCR, ASI, 
PSTATE, and CWP), HTSTATE[TL] (HPSTATE), sets PC and NPC, and decrements 
TL. DONE sets PC — TNPC[TL] and NPC<TNPC[TL]+4 (normally, the value of 
NPC saved at the time of the original trap and address of the instruction 
immediately after the one referenced by the NPC). 


Programming | The DONE and RETRY instructions are used to return from 
Notes | privileged trap handlers. 


Unlike RETRY, DONE ignores the contents of TPC[TL]. 


If the saved TNPC[TL] was not altered by trap handler software, DONE causes 
execution to resume immediately after the instruction that originally caused the trap 
(as if that instruction was “done” executing). 


Execution of a DONE instruction in the delay slot of a control-transfer instruction 
produces undefined results. 


When a DONE instruction is executed in privileged mode and 
HTSTATE[TL].hpstate.hpriv = 0 (which will cause the DONE to return the virtual 
processor to nonprivileged or privileged mode), the value of GL restored from 
TSTATE[TL] saturates at MAXPGL. That is, if the value in TSTATE[TL].gl is greater 
than MAXPGL, then MAXPGL is substituted and written to GL. This protects against 
non-hyperprivileged software executing with GL > MAXPGL. 


If software writes invalid or inconsistent state to TSTATE or HTSTATE before 
executing DONE, virtual processor behavior during and after execution of the 
DONE instruction is undefined. 


The DONE instruction does not provide an error barrier, as MEMBAR #Sync does 
(impl. dep. #215-U3). 


When PSTATE.am = 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system. 
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Exceptions 


See Also 


DONE 


IMPL. DEP. #417-S10: If (1) TSTATE[TL].pstate.am = 1 and (2) a DONE instruction 
is executed (which sets PSTATE.am to '1' by restoring the value from 
TSTATE[TL].pstate.am to PSTATE.am), it is implementation dependent whether the 
DONE instruction masks (zeroes) the more-significant 32 bits of the values it places 
into PC and NPC. 


Exceptions. In privileged mode (PSTATE.priv = 1 and HPSTATE.hpriv = 0) or 
hyperprivileged mode (HPSTATE.hpriv = 1), an attempt to execute DONE while 

TL = 0 causes an illegal instruction exception. An attempt to execute DONE (in any 
mode) with instruction bits 18:0 nonzero causes an illegal instruction exception. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), an attempt to 
execute DONE causes a privileged opcode exception. 


Implementation | In nonprivileged mode, illegal instruction exception due to TL = 0 
Note | does not occur. The privileged opcode exception occurs instead, 
regardless of the current trap level (TL). 


A trap level zero disrupting trap can occur upon the completion of a DONE 
instruction, if the following three conditions are true after DONE has executed: 
a trap level zero exceptions are enabled (HPSTATE.tlz = 1), 
a the virtual processor is in nonprivileged or privileged mode 
(HPSTATE.hpriv = 0), and 
a the trap level (TL) register's value is zero (TL = 0) 


illegal instruction 
privileged opcode 


trap level zero 


RETRY on page 313 


CHAPTER 7 * Instructions 169 


EDGE<8|16|32>{L}cc 





7.13 Edge Handling Instructions 


Instruction opf Operation Assembly Language Syntax + Class 
EDGE8cc 000000000 Eight 8-bit edge boundary processing  edge8cc Tegrg1, l'egrg2, Tega C3 


EDGE8Lcc 000000010 Eight 8-bit edge boundary processing, edge8lcc Tegrg1, l'egrg2, legjg C3 
little-endian 


EDGE16cc 00000 0100 Four 16-bit edge boundary processing edgelécc Tegrg1, l'egrg2, Tega C3 


EDGE16Lcc 000000110 Four 16-bit edge boundary processing, edgel6lcc  regysir reg,, Tegra C3 
little-endian 


EDGE32cc 000001000 Two 32-bit edge boundary processing edge32cc Tegrg1, l'egrs2, Tega C3 
EDGE32Lcc 000001010 Two 32-bit edge boundary processing, edge32lcc  regrsir eSrs2r reg;g C3 
little-endian 


t The original assembly language mnemonics for these instructions did not include the “cc” suffix, as appears in the names of all other 
instructions that set the integer condition codes. The old, non-”cc” mnemonics are deprecated. Over time, assemblers will support 
the new mnemonics for these instructions. In the meantime, some older assemblers may recognize only the mnemonics, without “cc”. 


HOO [S LL 9B LÀ 


31 30 29 25 24 19 18 14 18 5 4 0 


Description These instructions handle the boundary conditions for parallel pixel scan line loops, 
where R[rs1] is the address of the next pixel to render and R[rs2] is the address of 
the last pixel in the scan line. 


EDGES8Lcc, EDGE16Lcc, and EDGE32Lcc are little-endian versions of EDGE8cc, 
EDGE16cc, and EDGE32cc. They produce an edge mask that is bit-reversed from 
their big-endian counterparts but are otherwise identical. This makes the mask 
consistent with the mask produced by the Partial Store instruction (see Partial Store 
on page 298) on little-endian data. 


A 2-bit (EDGE32cc), 4-bit (EDGE16cc), or 8-bit (EDGE8cc) pixel mask is stored in the 
least significant bits of R[rd]. The mask is computed from left and right edge masks 
as follows: 


1. The left edge mask is computed from the 3 least significant bits of R[rs1] and the 
right edge mask is computed from the 3 least significant bits of R[rs2], according 
to TABLE 7-6. 


2. If a 32-bit address masking is disabled (PSTATE.am = 0, 64-bit addressing) and 
the upper 61 bits of R[rs1] are equal to the corresponding bits in R[rs2], R[rd] is 
set to the right edge mask anded with the left edge mask. 
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3. If 32-bit address masking is enabled (PSTATE.am = 1, 32-bit addressing) and bits 
31:3 of R[rs1] match bits 31:3 of R[rs2], R[rd] is set to the right edge mask anded 
with the left edge mask. 


4. Otherwise, R[rd] is set to the left edge mask. 


The integer condition codes are set per the rules of the SUBcc instruction with the 
same operands (see Subtract on page 303). 


TABLE 7-6 lists edge mask specifications. 


TABLE 7-6 Edge Mask Specification 








Edge Rirsz] Big Endian Little Endian 
Size (2:0) Left Edge Right Edge Left Edge Right Edge 
8 000 1111 1111 1000 0000 1111 1111 0000 0001 
8 001 0111 1111 1100 0000 1111 1110 0000 0011 
8 010 0011 1111 1110 0000 1111 1100 0000 0111 
8 011 0001 1111 1111 0000 1111 1000 0000 1111 
8 100 0000 1111 1111 1000 1111 0000 0001 1111 
8 101 0000 0111 1111 1100 1110 0000 0011 1111 
8 110 0000 0011 1111 1110 1100 0000 0111 1111 
8 111 0000 0001 1111 1111 1000 0000 1111 1111 
16 00x 1111 1000 1111 0001 
16 Olx 0111 1100 1110 0011 
16 10x 0011 1110 1100 0111 
16 11x 0001 1111 1000 1111 
32 Oxx 11 10 11 01 
32 1xx 01 11 10 11 
Exceptions illegal instruction 
See Also EDGE«8116132»[L]N on page 172 
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7.14 Edge Handling Instructions (no CC) 


Instruction opf Operation Assembly Language Syntax Class 
EDGE8N 000000001 Eight 8-bit edge boundary processing, no CC edge8n  regysir reg;go, Tegra C3 


EDGE8LN 000000011 Eight 8-bit edge boundary processing, edge8ln regrgqr légrgo, Tegra C3 
little-endian, no CC 


EDGE16N 000000101 Four 16-bit edge boundary processing, no CCedgel6n regysir regrso, Tegra C3 


EDGE16LN 000000111 Four 16-bit edge boundary processing, edgel6ln reg;str legrgo, Tegra C3 
little-endian, no CC 


EDGE32N 000001001 Two 32-bit edge boundary processing, no CC edge32n  reg;g;, legrs2r regra C3 


EDGE32LN 0 0000 1011 Two 32-bit edge boundary processing, edge321n regysir reg;gs2, Tegra C3 
little-endian, no CC 


HO [S EE USÀ 


31 30 29 25 24 19 18 14 18 5 4 0 


Description EDGES[L]N, EDGE16[LIN, and EDGE32[L]N operate identically to EDGE8[L]cc, 
EDGE16[L]cc, and EDGE32[L ]cc, respectively, but do not set the integer condition 
codes. 


See Edge Handling Instructions on page 170 for details. 
Exceptions illegal instruction 


See Also EDGE«8,16,32» [L]cc on page 170 
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7.15 


Floating-Point Absolute Value 


Instruction op3 opf Operation Assembly Language Syntax Class 
FABSs 110100 000001001 Absolute Value Singe  fabss  fregrs2r ffegg M 
FABSd 11 0100 0 0000 1010 Absolute Value Double fabsd fregrs2r fregra A1 
FABSq 11 0100 0 0000 1011 Absolute Value Quad fabsq fregrs2r fregrd C3 


31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 5 4 0 


FABS copies the source floating-point register(s) to the destination floating-point 
register(s), with the sign bit cleared (set to 0). 


FABSs operates on single-precision (32-bit) floating-point registers, FABSd operates on 
double-precision (64-bit) floating-point register pairs, and FABSq operates on quad- 
precision (128-bit) floating-point register quadruples. 


These instructions clear (set to 0) both FSR.cexc and FSR.ftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FABSq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FABS instruction when instruction bits 18:14 are nonzero 
causes an /llegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FABS instruction causes an fp disabled exception. 


illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented FPop (FABSq)) 
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7.16 


Floating-Point Add 


Instruction op3 opf Operation Assembly Language Syntax Class 
FADDs 110100 001000000  AddSinge  Æfadds  fregrsir fregrsor frega M 
FADDd 11 0100 0 0100 0010 Add Double faddd fresrsir leSrsor fregrd Al 
FADDq 11 0100 0 0100 0011 Add Quad faddq freSrsir freSrsor fregra C3 


31 30 29 


Description 


Exceptions 


9$ USES TUE 


19 18 14 18 5 4 0 


The floating-point add instructions add the floating-point register(s) specified by the 
rs1 field and the floating-point register(s) specified by the rs2 field. The instructions 
then write the sum into the floating-point register(s) specified by the rd field. 


Rounding is performed as specified by FSR.rd. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FADDq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FADD instruction causes an fp disabled exception. 


If the FPU is enabled, FADDgq causes an fp exception other (with FSR.ftt = 
unimplemented, FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


Note | An fp exception other with FSR.ftt = unfinished FPop can occur 
if the operation detects unusual, implementation-specific 
conditions. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 

fo disabled 

fp exception other (FSR.ftt = unimplemented FPop (FADDq)) 
fp exception other (FSR.ftt = unfinished FPop) 

fp exception ieee 754 (OF, UF, NX, NV) 
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7.17 | Align Data 


Instruction opf Operation Assembly Language Syntax Class 


FALIGNDATA 001001000 Perform data alignment for faligndata fregrst, fregrsor fregjg A1 
misaligned data 





3 Ton ES REN NN: 


O O À Q 8 À 4 Ü 


Description  FALIGNDATA concatenates the two 64-bit floating-point registers specified by rs1 
and rs2 to form a 128-bit (16-byte) intermediate value. The contents of the first 
source operand form the more-significant 8 bytes of the intermediate value, and the 
contents of the second source operand form the less significant 8 bytes of the 
intermediate value. Bytes in the intermediate value are numbered from most 
significant (byte 0) to least significant (byte 15). Eight bytes are extracted from the 
intermediate value and stored in the 64-bit floating-point destination register 
specified by rd. GSR.align specifies the number of the most significant byte to extract 
(and, therefore, the least significant byte extracted is numbered GSR.align+7). 


GSR.align is normally set by a previous ALIGNADDRESS instruction. 
GSR.align [101 








Fp[rs1] :: Fp[rs2] T 112 















































FIGURE 7-6 FALIGNDATA 


A byte-aligned 64-bit load can be performed as shown below. 


alignaddr Address, Offset, Address !set GSR.align 
ldd [Address], %a0 


ldd [Address + 8], 
faligndata %d0, %d2, %d4 luse GSR.align to select bytes 





If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FALIGNDATA instruction causes an fp disabled exception. 


Exceptions fp disabled 
See Also Align Address on page 149 
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7.18 Branch on Floating-Point Condition 
Codes (FBfcc) 

















Opcode cond Operation fcc Test Assembly Language Syntax Class 
FBA? 1000 Branch Always > 1 fba{,a} we M 
FBNP 0000 Branch Never 0 fbn{, a} label A1 
FBUP 0111 Branch on Unordered U fbu{, a} label A1 
FBG? 0110 Branch on Greater G fbg{,a} label A1 
FBUG? 0101 Branch on Unordered or Greater G or U foug{,a} label A1 
FBL? 0100 Branch on Less L £blí,al label A1 
FBUL? 0011 Branch on Unordered or Less LorU fbul{,a} label A1 
FBLGP 0010 Branch on Less or Greater LorG fblg{,a} label A1 
FBNEP 0001 Branch on Not Equal L or Gor U Fbne'{,a} label A1 
FBE? 1001 Branch on Equal E tbei(,a) label A1 
FBUE? 1010 Branch on Unordered or Equal E or U fbue[,a] label A1 
FBGE? 1011 Branch on Greater or Equal EorG fbge{,a} label A1 
FBUGE? 1100 Branch on Unordered or Greater or Equal E or Gor U fbugel,a] label A1 
FBLEP 1101 Branch on Less or Equal EorL fble{,a} label A1 
FBULEP 1110 Branch on Unordered or Less or Equal E or L or U fbule(,a] label A1 
FBO? 1111 Branch on Ordered E or L'or G fbo{,a} label A1 
t synonym: £bnz t synonym: fbz 
LINEA ES 
31 30 29 28 25 24 22 21 0 


Programming | To set the annul (a) bit for FBfcc instructions, append “, a” to 
Note | the opcode mnemonic. For example, use "£b1,a label". In the 
preceding table, braces around “, a” signify that “, a” is 
optional. 


Description Unconditional and Fcc branches are described below: 


m Unconditional branches (FBA, FBN) — If its annul field is 0, an FBN (Branch 
Never) instruction acts like a NOP. If its annul field is 1, the following (delay) 
instruction is annulled (not executed) when the FBN is executed. In neither case 
does a transfer of control take place. 
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FBA (Branch Always) causes a PC-relative, delayed control transfer to the address 
“PC + (4 x sign ext (disp22))" regardless of the value of the floating-point 
condition code bits. If the annul field of the branch instruction is 1, the delay 
instruction is annulled (not executed). If the annul (a) bit is 0, the delay 
instruction is executed. 


m Fcc-conditional branches — Conditional FBfcc instructions (except FBA and 
FBN) evaluate floating-point condition code zero (fcc0) according to the cond 
field of the instruction. Such evaluation produces either a TRUE or FALSE result. 
If TRUE, the branch is taken, that is, the instruction causes a PC-relative, delayed 
control transfer to the address “PC + (4 x sign ext(disp22))". If FALSE, the branch 
is not taken. 




















If a conditional branch is taken, the delay instruction is always executed, 
regardless of the value of the annul (a) bit. If a conditional branch is not taken 
and the annul bit is 1 (a = 1), the delay instruction is annulled (not executed). 


Note | The annul bit has a different effect on conditional branches than 
it does on unconditional branches. 


Annulment, delay instructions, and delayed control transfers are described 
further in Chapter 6. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FBfcc instruction causes an fp disabled exception. 


Exceptions fp disabled 
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7.19 


Branch on Floating-Point Condition 
Codes with Prediction (FBPfcc) 

















Instruction cond Operation fcc Test Assembly Language Syntax Class 
FBPA 1000 Branch Always 1 fba{,a}{,ptl,pn} $£ccn, label A1 
FBPN 0000 Branch Never 0 fbní,al,ptl,pn] $£ccn, label A1 
FBPU 0111 Branch on Unordered U fbuí,alM,ptl,pn] $£ccn, label A1 
FBPG 0110 Branch on Greater G fbg{,a}{,ptl,pn} $£ccn, label A1 
FBPUG 0101 Branch on Unordered or Greater G or U foug{,a}{,ptl,pn}  $fcenm, label A1 
FBPL 0100 Branch on Less L fbl{,a}H, ptl, pn} $£ccn, label A1 
FBPUL 0011 Branch on Unordered or Less Lor U fbul{,a}{,ptl,pn} %fccn, label A1 
FBPLG 0010 Branch on Less or Greater LorG fblgl,al(,pt!,pn] %fccn, label A1 
FBPNE 0001 Branch on Not Equal LorGorU  £bne'(,a], ptl,pn} %fccn, label A1 
FBPE 1001 Branch on Equal E £bet(, aH, pt |, pn] $£ccn, label A1 
FBPUE 1010 Branch on Unordered or Equal E or U fbue{, aH, ptl, pn} %fccn, label A1 
FBPGE 1011 Branch on Greater or Equal E or G fbge{, aH, ptl, pn} %fccn, label A1 
FBPUGE 1100 Branch on Unordered or Greater EorGorU fbuge{, a}, ptl, pn} %fccn, label A1 
or Equal 
FBPLE 1101 Branch on Less or Equal EorL fble{, aH, ptl, pn} %fccn, label A1 
FBPULE 1110 Branch on Unordered or Lessor EorLorU fbule{, a}, ptl, pn} %fccn, label A1 
Equal 
FBPO 1111 Branch on Ordered EorLorG fbo{,aH,ptl, pn} $£ccn, label A1 





t synonym: fbnz 


f synonym: £bz 


D EC A ee SSS 


31 30 29 28 





25 24 22 21 20 19 18 
cci cc0 Condition Code 
0 0 feco 

0 1 feci 

1 0 fcc2 

1 1 £e63 
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Description 


Exceptions 


FBPfcc 


Programming | To set the annul (a) bit for FBPfcc instructions, append “, a” to the 
Note | opcode mnemonic. For example, use “fbl,a %fcc3, label". In 

the preceding table, braces signify that the ", a” is optional. To set 
the branch prediction bit, append either ", pt" (for predict taken) 
or "pn" (for predict not taken) to the opcode mnemonic. If neither 
^, pt" nor “, pn" is specified, the assembler defaults to ", pt". To 
select the appropriate floating-point condition code, include 
“feco”, "Sfccl", "Sfcc2", or "Sfcc3" before the label. 





Unconditional branches and Fcc-conditional branches are described below. 


m Unconditional branches (FBPA, FBPN) — If its annul field is 0, an FBPN 
(Floating-Point Branch Never with Prediction) instruction acts like a NOP. If the 
Branch Never’s annul field is 0, the following (delay) instruction is executed; if 
the annul (a) bit is 1, the following instruction is annulled (not executed). In no 
case does an FBPN cause a transfer of control to take place. 


FBPA (Floating-Point Branch Always with Prediction) causes an unconditional 
PC-relative, delayed control transfer to the address 

“PC + (4x sign ext (disp19))". If the annul field of the branch instruction is 1, the 
delay instruction is annulled (not executed). If the annul (a) bit is 0, the delay 
instruction is executed. 


m Fcc-conditional branches — Conditional FBPfcc instructions (except FBPA and 
FBPN) evaluate one of the four floating-point condition codes (£cc0, £cc1, £cc2, 
£cc3) as selected by cc0 and cc1, according to the cond field of the instruction, 
producing either a TRUE or FALSE result. If TRUE, the branch is taken, that is, the 
instruction causes a PC-relative, delayed control transfer to the address 
“PC + (4x sign ext (disp19))". If FALSE, the branch is not taken. 




















If a conditional branch is taken, the delay instruction is always executed, 
regardless of the value of the annul (a) bit. If a conditional branch is not taken 
and the annul bit is 1 (a = 1), the delay instruction is annulled (not executed). 


Note | The annul bit has a different effect on conditional branches than it 
does on unconditional branches. 


The predict bit (p) gives the hardware a hint about whether the branch is expected 
to be taken. A 1 in the p bit indicates that the branch is expected to be taken. A 0 
indicates that the branch is expected not to be taken. 


Annulment, delay instructions, and delayed control transfers are described further 
in Chapter 6, Instruction Set Overview. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FBPfcc instruction causes an fp disabled exception. 


fo disabled 
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7.20 SIMD Signed Compare 








Instruction opf Operation si s2 d Assembly Language Syntax Class 

FCMPLE16 00010 0000 Four 16-bit compare; f64 f64 i64 femplelé freg;s, fregrso, reg rq C3 
set R[rd] if src1 < src2 

FCMPNE16 0 0010 0010 Four 16-bit compare; f64 f64 i64 fcmpnel6 freg;s, fregrgo, TES rq C3 
set R[rd] if src1 + src2 

FCMPLE32 0 0010 0100 Two 32-bit compare; f64 f64 i64 fcmple32 fregrst, fregrgo, lESrd C3 
set R[rd] if src1 < src2 

FCMPNE32 0 0010 0110 Two 32-bit compare; f64 f64 i64 fcmpne32 freg;s, freSrs2 TE rq C3 
set R[rd] if src1 # src2 

FCMPGT16 0 0010 1000 Four 16-bit compare; f64 f64 i64 fempgt16 freg;s, fregrso, reg rq C3 
set R[rd] if src1 > src2 

FCMPEQ16 00010 1010 Four 16-bit compare; f64 f64 i64 fempeqlé freg;s, fregrso, regrq C3 
set R[rd] if src1 = src2 

FCMPGT32 00010 1100 Two 32-bit compare; f64 f64 i64 fcmpgt32 freg;s, fregrso, Tegrq C3 
set R[rd] if src1 » src2 

FCMPEQ32 0 0010 1110 Two 32-bit compare; f64 f64 i64 fcmpeq32 freg;s, fregrso, Tegrq C3 


set R[rd] if src1 = src2 


rd 110110 rst opt rs2 


31 30 29 25 24 19 18 14 13 5 4 0 


Description Either four 16-bit signed values or two 32-bit signed values in Fp[rs1] and Fp[rs2] 
are compared. The 4-bit or 2-bit condition-code results are stored in the least 
significant bits of the integer register R[rd]. The least significant 16-bit or 32-bit 
compare result corresponds to bit zero of R[rd]. 


Note | Bits 63:4 of the destination register R[rd] are set to zero for 16-bit 
compares. Bits 63:2 of the destination register R[rd] are set to 
zero for 32-bit compares. 


For FCMPGT{16,32}, each bit in the result is set to 1 if the corresponding signed 
value in Fp[rs1] is greater than the signed value in Fp[rs2]. Less-than comparisons 
are made by swapping the operands. 


For FCMPLE{16,32}, each bit in the result is set to 1 if the corresponding signed value 
in Fp[rs1] is less than or equal to the signed value in Fp[rs2]. Greater-than-or-equal 
comparisons are made by swapping the operands. 


For FCMPEQ{16,32}, each bit in the result is set to 1 if the corresponding signed 
value in Fp[rs1] is equal to the signed value in Fp[rs2]. 
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For FCMPNE(16,32], each bit in the result is set to 1 if the corresponding signed 
value in Fp[rs1] is not equal to the signed value in Fp[rs2]. 


FIGURE 7-7 and FIGURE 7-8 illustrate 16-bit and 32-bit pixel comparison operations, 
respectively. 


Fp[rs1] 
63 48 47 32 31 16 15 0 
fcmp[gt, le, eq, ne, It, ge]16 
Fp[rs2] 
R[rd] 








FIGURE 7-8 Two 32-bit Signed Fixed-point SIMD Comparison Operation 


In all comparisons, if a compare condition is not true, the corresponding bit in the 
result is set to 0. 


Programming | The results of a SIMD signed compare operation can be used 
Note | directly by both integer operations (for example, partial stores) 
and partitioned conditional moves. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute a SIMD signed compare instruction causes an fp disabled 
exception. 
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Exception fo_ disabled 


See Also STPARTIALF on page 347 


182 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


FCMP<s|d|q> / FCMPE«s|d|q» 





7.21 Floating-Point Compare 











Instruction opf Operation Assembly Language Syntax Class 

FCMPs 001010001 Compare Single Ecmps $fccn, freg;g1, fregrs2 A1 

FCMPd 001010010 Compare Double Fcmpd %fccn, fregrsir fregrso Al 

FCMPq 001010011 Compare Quad fcmpq %fccn, fregrsir fregrso C3 

FCMPEs 001010101 Compare Single and Exception if fcmpes  $fcon, fregrsir fregrs2 A1 
Unordered 

FCMPEd 001010110 Compare Double and Exception if Fcmped %fccn, fregrsir freSrs2 A1 
Unordered 

FCMPEq 001010111 Compare Quad and Exception if fcmpeq %fccn, fregrsir fregrso C3 
Unordered 





cc 11 0101 opf rs2 
EANES E 


31 30 29 27 26 25 24 14 13 5 4 0 
cci ccO Condition Code 
0 0 feco 
0 1 fccl 
1 0 fGc2 
1 1 fcc3 


Description These instructions compare F[rs1] with F[rs2] , and set the selected floating-point 
condition code (fccn) as follows 


Relation = = ^ Resulting fcc value 
fregrs1 = fregyso 0 
Jregrs1 < feSrs2 1 
Jregrs1 > feSrs2 2 
fregrs1 ? fregrs2 (unordered) 3 


The "?" in the preceding table means that the compared values are unordered. The 
unordered condition occurs when one or both of the operands to the comparison is a 
signalling or quiet NaN 


The "compare and cause exception if unordered" (FCMPEs, FCMPEd, and FCMPEq) 
instructions cause an invalid (NV) exception if either operand is a NaN. 
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FCMP causes an invalid (NV) exception if either operand is a signalling NaN. 


V8 Compatibility | Unlike the SPARC V8 architecture, SPARC V9 and the 
Note | UltraSPARC Architecture do not require an instruction between a 
floating-point compare operation and a floating-point branch 
(FBfcc, FBPfcc). 


SPARC V8 floating-point compare instructions are required to 
have rd = 0. In SPARC V9 and the UltraSPARC Architecture, bits 
26 and 25 of the instruction (rd{1:0}) specify the floating-point 
condition code to be set. Legal SPARC V8 code will work on 
SPARC V9 and the UltraSPARC Architecture because the zeroes 
in the R[rd] field are interpreted as fcc0 and the FBfcc 
instruction branches based on the value of feco. 


An attempt to execute an FCMP instruction when instruction bits 29:27 are nonzero 
causes an /llegal instruction exception. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware the instructions that refer to quad-precision floating- 
point registers. An attempt to execute FCMPq or FCMPEq 
generates fp exception other (with 
FSRftt = unimplemented FPop), which causes a trap, allowing 
privileged software to emulate the instruction. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FCMP or FCMPE instruction causes an fp disabled exception. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 

fo disabled 

fp exception ieee 754 (NV) 

fp exception other (FSR.ftt = unimplemented FPop (FCMPq, FCMPEq only)) 
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1,22 


Floating-Point Divide 


Instruction op3 opf Operation Assembly Language Syntax Class 
FDIVs 110100 001001101 Divide Single  Æfdivs fregrsir fregrs2r fgg A1 
FDIVd 11 0100 00100 1110 Divide Double fdivd fregrsir freSrsor fregrd A1 
FDIVq 11 0100 001001111 Divide Quad fdivq fregrsir freSrsa, fregrd C3 


31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 5 4 0 


The floating-point divide instructions divide the contents of the floating-point 
register(s) specified by the rs1 field by the contents of the floating-point register(s) 
specified by the rs2 field. The instructions then write the quotient into the floating- 
point register(s) specified by the rd field. 


Rounding is performed as specified by FSR.rd. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FCMP or FCMPE instruction causes an fp disabled exception. 


If the FPU is enabled, FDIVq causes an fp exception other (with FSR.ftt = 
unimplemented FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


Note | For FDIVs and FDIVd, an fp exception other with 
FSR.ftt = unfinished FPop can occur if the divide unit detects 
unusual, implementation-specific conditions. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 

fo disabled 

fp exception other (FSR.ftt = unimplemented FPop (FDIVq only)) 
fp exception other (FSR.ftt = unfinished. FPop (FDIVs, FDIV)) 

fp exception ieee 754 (OF, UF, DZ, NV, NX) 
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723  FEXPAND 





Instruction opf Operation si s2 d Assembly Language Syntax Class 
FEXPAND 001001101 Four 16-bit expands — £132 f64 fexpand fregrso, fregrg C3 
oy w Hor — [e T 
31 30 29 25 24 19 18 14 18 5 4 0 


Description  FEXPAND takes four 8-bit unsigned integers from Fg[rs2], converts each integer to a 
16-bit fixed-point value, and stores the four resulting 16-bit values in a 64-bit 
floating-point register Fp[rd]. FIGURE 7-10 illustrates the operation. 






































31 87 0 
Fplrd] | 0000 | 77 0000 | 0000 0000 | 
63 60 59 5251 48 47 44 43 36 35 32 31 28 27 2019 16 15 12 11 43 0 


FIGURE 7-9 FEXPAND Operation 


This operation is carried out as follows: 
1. Left-shift each 8-bit value by 4 and zero-extend each result to a 16-bit fixed value. 


2. Store the result in the destination register, Fp[rd]. 


Programming | FEXPAND performs the inverse of the FPACK16 operation. 
Note 


In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and is emulated in 
software. 


An attempt to execute an FEXPAND instruction when instruction bits 18:14 are 
nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 


See Also FPMERGE on page 221 
FPACK on page 212 
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FiTO<s|d|q> 





7.24 Convert 32-bit Integer to Floating Point 


Assembly Language 





Instruction op3 opf Operation s1 s2 d Syntax Class 

FiTOs 110100 011000100 Convert 32-bit Integer to  — f32 (32 fitos freg;so, fregra A1 
Single 

FiTOd 110100 011001000 Convert 32-bit Integer to  — f32 f64 fitod freg;so, fregra A1 
Double 

FiTOq 110100 011001100 Convert 32-bit Integer to  — . f32 f128 fitog freg;so, fregrg C3 
Quad 

oy] P3 e 
31 30 29 25 24 19 18 14 13 5 4 0 


Description FiTOs, FiTOd, and FiTOq convert the 32-bit signed integer operand in floating-point 
register Fs[rs2] into a floating-point number in the destination format. All write 
their result into the floating-point register(s) specified by rd. 


The value of FSR.rd determines how rounding is performed by FiTOs. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FiTOq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FiTO«s |d |q> instruction when instruction bits 18:14 are 
nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FiTO«sdlq» instruction causes an fp disabled exception. 


If the FPU is enabled, FiTOq causes an fp exception other (with FSR.ftt — 
unimplemented_FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented FPop (FiTOq)) 
fp exception ieee 754 (NX (FiTOs only)) 


CHAPTER 7 * Instructions 187 


FLUSH 





7.25 | Flush Instruction Memory 


Instruction op3 Operation Assembly Language Syntaxt Class 


FLUSH 11 1011 Flush Instruction Memory flush [address] A1 





t The original assembly language syntax for a FLUSH instruction (“flush address") has been deprecated be- 
cause of inconsistency with other SPARC assembly language syntax. Over time, assemblers will support the 
new syntax for this instruction. In the meantime, some existing assemblers may only recognize the original syn- 
tax. 


ATI IH ———I—S 
AI — T ow 8 HL. smi — —] 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description FLUSH ensures that the aligned doubleword specified by the effective address is 
consistent across any local caches and, in a multiprocessor system, will eventually 
(impl. dep. #122-V9) become consistent everywhere. 


The SPARC V9 instruction set architecture does not guarantee consistency between 
instruction memory and data memory. When software writes! to a memory location 
that may be executed as an instruction (self-modifying code?), a potential memory 
consistency problem arises, which is addressed by the FLUSH instruction. Use of 
FLUSH after instruction memory has been modified ensures that instruction and 
data memory are synchronized for the processor that issues the FLUSH instruction. 


The virtual processor waits until all previous (cacheable) stores have completed 
before issuing a FLUSH instruction. For the purpose of memory ordering, a FLUSH 
instruction behaves like a store instruction. 


In the following discussion PpLusų refers to the virtual processor that executed the 
FLUSH instruction. 


FLUSH causes a synchronization within a virtual processor which ensures that 
instruction fetches from the specified effective address by PrLusx appear to execute 
after any loads, stores, and atomic load-stores to that address issued by Pgyysy prior 
to the FLUSH. In a multiprocessor system, FLUSH also ensures that these values will 
eventually become visible to the instruction fetches of all other virtual processors in 
the system. With respect to MEMBAR-induced orderings, FLUSH behaves as if it is a 
store operation (see Memory Barrier on page 275). 


1. this includes use of store instructions (executed on the same or another virtual processor) that write to 
instruction memory, or any other means of writing into instruction memory (for example, DMA transfer) 


2. practiced, for example, by software such as debuggers and dynamic linkers 
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If i 2 0, the effective address operand for the FLUSH instruction is "R[rs1] + R[rs2]"; 
ifi = 1, it is “R[rs1] + sign ext (simm13)". The three least-significant bits of the 
effective address are ignored; that is, the effective address always refers to an 
aligned doubleword. 


See implementation-specific documentation for details on specific implementations 
of the FLUSH instruction. 


On an UltraSPARC Architecture processor: 


m A FLUSH instruction causes a synchronization within the virtual processor on 
which the FLUSH is executed, which flushes its instruction pipeline to ensure that 
no instruction already fetched has subsequently been modified in memory. Any 
other virtual processors on the same physical processor are unaffected by a 
FLUSH. 


m Coherency between instruction and data memories may or may not be 
maintained by hardware. 


IMPL. DEP. #409-S10-Cs20: The implementation of the FLUSH instruction is 
implementation dependent. If the implementation automatically maintains 
consistency between instruction and data memory, 
(1) the FLUSH address is ignored and 
(2) the FLUSH instruction cannot cause any data access exceptions, because 

its effective address operand is not translated or used by the MMU. 
On the other hand, if the implementation does not maintain consistency between 
instruction and data memory, the FLUSH address is used to access the MMU and the 
FLUSH instruction can cause data access exceptions. 


Programming | For portability across all SPARC V9 implementations, software 
Note | must always supply the target effective address in FLUSH 
instructions. 
m If the implementation contains instruction prefetch buffers: 
a the instruction prefetch buffer(s) are invalidated 


a instruction prefetching is suspended, but may resume starting with the 
instruction immediately following the FLUSH 


Programming | 1.Typically, FLUSH is used in self-modifying code. 
Notes | The use of self-modifying code is discouraged. 


2. If a program includes self-modifying code, to be portable it must 
issue a FLUSH instruction for each modified doubleword of 
instructions (or make a call to privileged software that has an 
equivalent effect) after storing into the instruction stream. 
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3. The order in which memory is modified can be controlled by 
means of FLUSH and MEMBAR instructions interspersed 
appropriately between stores and atomic load-stores. FLUSH is 
needed only between a store and a subsequent instruction fetch 
from the modified location. When multiple processes may 
concurrently modify live (that is, potentially executing) code, the 
programmer must ensure that the order of update maintains the 
program in a semantically correct form at all times. 


4. The memory model guarantees in a uniprocessor that data loads 
observe the results of the most recent store, even if there is no 
intervening FLUSH. 


5. FLUSH may be a time-consuming operation. 
(see the Implementation Note below) 


6. In a multiprocessor system, the effects of a FLUSH operation 
will be globally visible before any subsequent store becomes 
globally visible. 


7. FLUSH is designed to act on a doubleword. On some 
implementations, FLUSH may trap to system software. For these 
reasons, system software should provide a service routine, 
callable by nonprivileged software, for flushing arbitrarily-sized 
regions of memory. On some implementations, this routine 
would issue a series of FLUSH instructions; on others, it might 
issue a single trap to system software that would then flush the 
entire region. 


8. FLUSH operates using the current (implicit) context. Therefore, 
a FLUSH executed in privileged or hyperprivileged mode will 
use the nucleus context and will not necessarily affect instruction 
cache lines containing data from a user (nonprivileged) context. 


Implementation | In a multiprocessor configuration, FLUSH requires all processors 
Note | that may be referencing the addressed doubleword to flush their 
instruction caches, which is a potentially disruptive activity. 


V9 Compatibility | The effect of a FLUSH instruction as observed from the virtual 
Note | processor on which FLUSH executes is immediate. Other virtual 
processors in a multiprocessor system eventually will see the 
effect of the FLUSH, but the latency is implementation dependent. 


An attempt to execute a FLUSH instruction when instruction bits 29:25 are nonzero 
causes an /llegal instruction exception. 
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Exceptions illegal instruction 
fast data access MMU miss (impl. dep. #409-510-Cs20) 
fast data access protection 





CHAPTER 7 * Instructions 191 


FLUSHW 





7.26 | Flush Register Windows 


Instruction op3 Operation Assembly Language Syntax Class 


FLUSHW 101011 Flush Register Windows flushw A1 





[pne we T p—————*7 


31 30 29 25 24 19 18 14 13 12 0 


Description FLUSHW causes all active register windows except the current window to be 
flushed to memory at locations determined by privileged software. FLUSHW 
behaves as a NOP if there are no active windows other than the current window. At 
the completion of the FLUSHW instruction, the only active register window is the 
current one. 


Programming 
Note 


The FLUSHW instruction can be used by application software to 
flush register windows to memory so that it can switch memory 
stacks or examine register contents from previous stack frames. 





FLUSHW acts as a NOP if CANSAVE = N REG WINDOWS — 2. Otherwise, there is 
more than one active window, so FLUSHW causes a spill exception. The trap vector 
for the spill exception is based on the contents of OTHERWIN and WSTATE. The spill 
trap handler is invoked with the CWP set to the window to be spilled (that is, 
(CWP + CANSAVE + 2) mod N REG WINDOWS). See Register Window Management 
Instructions on page 131. 


Programming | Typically, the spill handler saves a window on a memory stack 
Note | and returns to reexecute the FLUSHW instruction. Thus, FLUSHW 
traps and reexecutes until all active windows other than the 
current window have been spilled. 


An attempt to execute a FLUSHW instruction when instruction bits 29:25, 18:14, or 


12:0 are nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 
spill n normal 
spill n other 
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27 


Floating-Point Move 


Instruction op3 opf Operation Assembly Language Syntax Class 
FMOVs 11 0100 0 0000 0001 Move (copy) Single fmovs fregrsor fregra Al 
FMOVd 11 0100 0 0000 0010 Move (copy) Double fmovd fregrsor fregra Al 
FMOVq 11 0100 0 0000 0011 Move (copy) Quad fmovq fregrsor fregra C3 





31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 5 4 0 


FMOV copies the source floating-point register(s) to the destination floating-point 
register(s), unaltered. 


FMOVs, FMOVd, and FMOVq perform 32-bit, 64-bit, and 128-bit operations, 
respectively. 


These instructions clear (set to 0) both FSR.cexc and FSR.ftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FMOVq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FMOV instruction when instruction bits 18:14 are nonzero 
causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FMOV instruction causes an fp disabled exception. 


If the FPU is enabled, an attempt to execute an FMOVgq instruction causes an 
fp exception other (with FSR.ftt = unimplemented  FPop), since that instruction is 
not implemented in hardware in UltraSPARC Architecture 2005 implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented_FPop (FMOVq only)) 
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See Also F Register Logical Operate (2 operand) on page 227 
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FMOVcc 





7.28 Move Floating-Point Register on 
Condition (FMOVcc) 





Instruction opf low Operation Assembly Language Syntax Class 

FMOVSicc 00 0001 Move Floating-Point Single, fmovsicc %icc, fregrgo, fregrd A1 
based on 32-bit integer condition codes 

FMOVDicc 000010 Move Floating-Point Double, fmovdicc %icc, fregrso, fregrd Al 
based on 32-bit integer condition codes 

FMOVQicc 000011 Move Floating-Point Quad, fmovaicc %icc, fregrso, fregrd C3 


based on 32-bit integer condition codes 


FMOVSxcc 00 0001 Move Floating-Point Single, fmovsxcc $xcc, fregrs2r freg rq A1 
based on 64-bit integer condition codes 

FMOVDxcc 000010 Move Floating-Point Double, fmovdxcc $xcc, fregysor freg rq A1 
based on 64-bit integer condition codes 

FMOVQxcc 000011 Move Floating-Point Quad, fmovqxcc $xcc, fregysor freg rq C3 


based on 64-bit integer condition codes 





FMOVSfcc 00 0001 Move Floating-Point Single, fmovsfec %fccn, freg;so, frega A1 
based on floating-point condition codes 

FMOVDfcc 000010 Move Floating-Point Double, fmovdfcc %fccn, fregrso, fregrq A 
based on floating-point condition codes 

FMOVOfcc 000011 Move Floating-Point Quad, fmovafcc %fccn, fregrso, frega C3 


based on floating-point condition codes 








WI 1 oor] cond [we] uw | 7. 
0 29 q 9 18 à 0 q 0 
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Encoding of the cond Field for F.P. Moves Based on Integer Condition Codes (icc or xcc) 





cond 
1000 
0000 
1001 
0001 
1010 
0010 
1011 
0011 
1100 
0100 
1101 
0101 
1110 
0110 
1111 

0111 


Operation 

Move Always 

Move Never 

Move if Not Equal 

Move if Equal 

Move if Greater 

Move if Less or Equal 

Move if Greater or Equal 

Move if Less 

Move if Greater Unsigned 

Move if Less or Equal Unsigned 

Move if Carry Clear (Greater or Equal, Unsigned) 
Move if Carry Set (Less than, Unsigned) 
Move if Positive 

Move if Negative 


Move if Overflow Clear 





Move if Overflow Set 


196 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


icc / xcc Test 


not Z 


not (Z or (N xor V)) 


Z or (N xor V) 
not (N xor V) 
N xor V 

not (C or Z) 
(C or Z) 


not C 


icc/xcc name(s) in 
Assembly Language 


Mnemonics 
a 
n 
ne (or nz) 
e (or z) 
g 
le 


leu 
cc (or geu) 
cs (or 1u) 
pos 
neg 
vc 


VS 


FMOVcc 


Encoding of the cond Field for F.P. Moves Based on Floating-Point Condition Codes (fccn) 





fcc name(s) in Assembly 











cond Operation fccn Test Language Mnemonics 
1000 Move Always 1 a 
0000 Move Never 0 n 
0111 Move if Unordered U u 
0110 Move if Greater G g 
0101 Move if Unordered or Greater GorU ug 
0100 Move if Less L 1 
0011 Move if Unordered or Less LorU ul 
0010 Move if Less or Greater LorG lg 
0001 Move if Not Equal LorGorU ne (or nz) 
1001 Move if Equal E e (or z 
1010 Move if Unordered or Equal E or U ue 
1011 Move if Greater or Equal EorG ge 
1100 Move if Unordered or Greater or Equal E or Gor U uge 
1101 Move if Less or Equal EorL le 
1110 Move if Unordered or Less or Equal E or L or U ule 
1111 Move if Ordered EorLorG o 





Encoding of opt cc Field (also see TABLE E-10 on page 484) 
Condition Code 
opf cc Instruction to be Tested 
1000  FMOV«sldlq»icc icc 
1100 | FMOV«sldlq»xcc xcc 
000,  FMOV<sldlq>fec fecd 





0015 fcc1 
010; fcc2 
011 fcc3 


101; (illegal instruction exception) 
1115 
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Description 


FMOVcc 


The FMOVcc instructions copy the floating-point register(s) specified by rs2 to the 
floating-point register(s) specified by rd if the condition indicated by the cond field is 
satisfied by the selected floating-point condition code field in FSR. The condition 
code used is specified by the opf cc field of the instruction. If the condition is 
FALSE, then the destination register(s) are not changed. 





These instructions read, but do not modify, any condition codes. 


These instructions clear (set to 0) both FSR.cexc and FSRftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FMOVQicc, FMOVOxcc, or 
FMOVOfcc instruction causes an illegal instruction exception, 
allowing privileged software to emulate the instruction. 


An attempt to execute an FMOVcc instruction when instruction bit 18 is nonzero or 
opf cc = 101 or 111, causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FMOVQicc, FMOVOXxcc, or FMOVOfcc instruction causes an 
fp disabled exception. 


If the FPU is enabled, an attempt to execute an FMOVQicc, FMOVOxcc, or 
FMOVOfcc instruction causes an fp exception other (with FSR.ftt = 
unimplemented, FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 
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Programming 
Note 


FMOVcc 


Branches cause the performance of most implementations to 
degrade significantly. Frequently, the MOVcc and FMOVcc 
instructions can be used to avoid branches. For example, the 
following C language segment: 


double A, B, X; 
if (A > B) then X = 1.03; else X = 0.0; 


can be coded as 


! assume À is in $f0; B is in $f2; $xx points to 
! constant area 
ldd [Sxx+C_1.03],%f£4 
fcmpd Sfcc3,%f0,%f2 
fble,a %fcc3,label 
! following instructiononly executed if the 
! preceding branch was taken 
fsubd S£4,Sf£4, £4 ! X = 0.0 
label:... 


1X 
1 A > B 


This code takes four instructions including a branch. 
With FMOVcc, this could be coded as 


lda [$xx*C 1.03],$£f4 ! X = 1.03 
fsubd  $f4,$f4,$f6 !X' = 0.0 
fcmpd Sfcc3,%f0,%f2 !A>B 
fmovdle $fcc3,$f6,$f4 ! X = 0.0 


This code also takes four instructions but requires no branches 
and may boost performance significantly. Use MOVcc and 
FMOVcc instead of branches wherever these instructions would 





illegal instruction 
fo disabled 


improve performance. 


fp exception other (FSR.ftt = unimplemented  FPop (opf cc = 101; or 1115)) 


fp exception other (FSR.ftt = unimplemented FPop (FMOVO instructions only)) 
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1.29 


31 30 29 


Move Floating-Point Register on Integer 
Register Condition (FMOVR) 


FMOVR 














Instruction rcond  opf low Operation Test Class 
— 000 0 0101 Reserved — — 
FMOVRsZ 001 00101 Move Single if Register = 0 R[rst] 20 At 
FMOVRsLEZ 010 00101 Move Single if Register < 0 Rfrsi]<0 A1 
FMOVRsLZ 01 0 0101 Move Single if Register « 0 R[rs1] «0 A1 
= 100 0 0101 Reserved — — 
FMOVRsNZ 101 00101 Move Single if Register + 0 R[rsí] ZO A1 
FMOVRsGZ 110 00101 Move Single if Register » 0 Rfrsi]>0 A1 
FMOVRsGEZ 111 0 0101 Move Single if Register 2 0 R[r1]20 A1 
= 000 0 0110 Reserved — — 
FMOVRdZ 001 00110 Move Double if Register = 0 R[rs1]20 A1 
FMOVRdLEZ 010 00110 Move Double if Register < 0 Rfrsi]<0 A1 
FMOVRdLZ 011 00110 Move Double if Register « 0 Rfrsi]<0 A1 
— 100 0 0110 Reserved — — 
FMOVRdNZ 101 0 0110 Move Double if Register # 0 R[rsi] 20 A1 
FMOVRdGZ 110 0 0110 Move Double if Register > 0 Rfrsi]>0 A1 
FMOVRdGEZ 111 0 0110 Move Double if Register 2 0 R[rs] 20 A1 
— 000 0 0111 Reserved — — 
FMOVRqZ 001 00111 Move Quad if Register = 0 Rfrsi]=0 C3 
FMOVRqLEZ 010 00111 Move Quad if Register < 0 Rfrsi]<0 C3 
FMOVRqLZ 01 00111 Move Quad if Register « 0 Rfrsi]<0 C3 
= 100 00111 Reserved — — 
FMOVRqNZ 101 00111 Move Quad if Register + 0 R[rst] 20 C3 
FMOVRqGZ 110 00111 Move Quad if Register » 0 Rfrsi]>0 C3 
FMOVRqGEZ 111 00111 Move Quad if Register 2 0 R[rs1] 20 C3 
Le Bd dur caste OMR | 
25 24 19 18 14 13 12 10 9 5 4 0 
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Description 


FMOVR 





Assembly Language Syntax 

tmovr(s,d,q]z  regyrsir fleSrs2r fregrd (synonym: fmovr{s, d, q}e) 
tmovr[s,d,q]lez regys1r fregrs2r fre rg 

tmovr(s,d,q]lz regis, freSrs2r fregrd 

tmovr[s,d,q]nz regrstr freSrs2r fregrd (synonym: fmovr{s,d, qne) 
tmovr(s,d,q]gz regrsir fleSrs2r fre rg 

{ 


fmovr{s,d, q}gez reg;s1, fregrs2r fregrd 








If the contents of integer register R[rs1] satisfy the condition specified in the rcond 
field, these instructions copy the contents of the floating-point register(s) specified 
by the rs2 field to the floating-point register(s) specified by the rd field. If the 
contents of R[rs1] do not satisfy the condition, the floating-point register(s) specified 
by the rd field are not modified. 


These instructions treat the integer register contents as a signed integer value; they 
do not modify any condition codes. 


These instructions clear (set to 0) both FSR.cexc and FSR.ftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FMOVRa instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FMOVR instruction when instruction bit 13 is nonzero or 
rcond = 0005 or 100, causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FMOVR instruction causes an fp disabled exception. 


If the FPU is enabled, an attempt to execute an FMOVRa instruction causes an 
fp exception other (with FSR.ftt = unimplemented_FPop), since that instruction is 
not implemented in hardware in UltraSPARC Architecture 2005 implementations. 
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Implementation | If this instruction is implemented by tagging each register value 
Note | with an N (negative) and a Z (zero) condition bit, use the 
following table to determine whether rcond is TRUE: 








Branch Test 
FMOVRNZ not Z 
FMOVRZ Z 


FMOVRGEZ not N 
FMOVRLZ N 
FMOVRLEZ NorZ 
FMOVRGZ N nor Z 


Exceptions fo_disabled 
fp exception other (FSR.ftt = unimplemented_FPop (rcond = 0005 or 1005)) 
fp exception other (FSR.ftt = unimplemented FPop (FMOVRq)) 
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7.30 Partitioned Multiply Instructions 


Instruction opf 


FMUL8x16 0 0011 0001 


FMUL8x16AU 00011 0011 


FMUL8x16AL 00011 0101 


FMULS8SUx16 0.0011 0110 


FMULSULx16 000110111 


FMULD8SUx16 0 0011 1000 








FMULD8ULx16 0 0011 1001 


FMUL (partitioned) 


Operation si s2 d Assembly Language Syntax 


Unsigned 8-bit by signed 16-bit f32 f64 f64 fmul 


partitioned product 


Unsigned 8-bit by signed 16-bit f32 f32 f64 fmul 


upper a partitioned product 


Unsigned 8-bit by signed 16-bit f32 f32 f64 fmul 


lower & partitioned product 


Signed upper 8-bit by signed  f64 f64 f64 fmul 


16-bit partitioned product 


Unsigned lower 8-bit by signed f64 f64 f64 fmul 


16-bit partitioned product 


Signed upper 8-bit by signed  f32 f32 f64 fmul 


16-bit partitioned product 


Unsigned lower 8-bit by signed f32 f32 f64 fmul 


16-bit partitioned product 





8x16 fregrs1, freSrsar fregra 
8x16au fregrstr fregrs2, fregra 
8xl6al fregrsy, fregrsar fregra 
8suxl6 fregrsir fregrsar fregra 
Bulx16 fregrgi, fregrsar freSrg 
d8sux16fregrs1r freSrs2r fleSrd 


A8ulxl6 fregrss, freSrsor feSra 


Class 


C3 


C3 


C3 


C3 


C3 


C3 


C3 





110110 rst opf rs2 


31 30 29 25 24 


19 18 14 13 


5 4 


Programming | When software emulates an 8-bit unsigned by 16-bit signed 
Note multiply, the unsigned value must be zero-extended and the 16-bit 
value sign-extended before the multiplication. 


Description The following sections describe the versions of partitioned multiplies. 


In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an illegal instruction exception, and are emulated 
in software. 


Exceptions illegal instruction 
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FMUL (partitioned) 
7.30.1 FMUL8x16 Instruction 


FMULS8x16 multiplies each unsigned 8-bit value (for example, a pixel component) in 
the 32-bit floating-point register Fs[rs1] by the corresponding (signed) 16-bit fixed- 
point integer in the 64-bit floating-point register Fp[rs2]. It rounds the 24-bit product 
(assuming binary point between bits 7 and 8) and stores the most significant 16 bits 
of the result into the corresponding 16-bit field in the 64-bit floating-point 
destination register Fp[rd]. FIGURE 7-10 illustrates the operation. 


Note | This instruction treats the pixel component values as fixed-point 
with the binary point to the left of the most significant bit. 
Typically, this operation is used with filter coefficients as the fixed- 
point rs2 value and image data as the rs1 pixel value. Appropriate 
scaling of the coefficient allows various fixed-point scaling to be 



































realized. 
F[rs1] | 
F[rs2] | | 
63 Y 16 15 Y 0 
XMS16b d en XMS16b 
F[rd] | | | | | 
63 48 47 32 31 16 15 0 


FIGURE 7-10 FMUL8x16 Operation 
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FMUL (partitioned) 
7.30.2 FMULSx16AU Instruction 


FMULS8x16AU is the same as FMUL8x16, except that one 16-bit fixed-point value is 
used as the multiplier for all four multiplies. This multiplier is the most significant 
("upper") 16 bits of the 32-bit register Fg[rs2] (typically an & pixel component 
value). FIGURE 7-11 illustrates the operation. 


FsIrs1] 


Fsirs2] 


Fplrdl | ' | ' | Y | Y | 


63 48 47 32 31 16 15 0 














FIGURE 7-11 FMUL8x16AU Operation 


7.30.3 FMULS8x16AL Instruction 


FMULS8x16AL is the same as FMUL8x16AU, except that the least significant 
("lower") 16 bits of the 32-bit register Fs[rs2] register are used as a multiplier. 
FIGURE 7-12 illustrates the operation. 





FsIrs1] 

















Fs[rs2] 





Fpírd] 
63 48 47 32 31 16 15 0 
FIGURE 7-12 FMUL8x16AL Operation 
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7.30.4 


7.30.5 


FMUL (partitioned) 
FMUL8SUx16 Instruction 


FMUL8SUx16 multiplies the most significant (“upper”) 8 bits of each 16-bit signed 
value in the 64-bit floating-point register Fp[rs1] by the corresponding signed, 16-bit, 
fixed-point, signed integer in the 64-bit floating-point register Fp[rs2]. It rounds the 
24-bit product toward the nearest representable value and then stores the most 
significant 16 bits of the result into the corresponding 16-bit field of the 64-bit 
floating-point destination register Fp[rd]. If the product is exactly halfway between 
two integers, the result is rounded toward positive infinity. FIGURE 7-13 illustrates the 
operation. 



































Fpírs1] ME OE ack RUE ri 
63 15655 48 47 M0 39 3231 M4 23 16 15 7 0 
Fpírs2] | 
63 Y 48 47 Y 32 81 Y 16 15 Y 0 
XMS16b XMs16b XMS16b XMs16b 
Fplrd] 
63 48 47 32 31 16 15 0 


FIGURE 7-13 FMUL8SUx16 Operation 


FMUL8ULx16 Instruction 


FMUL8ULx16 multiplies the unsigned least significant ("lower") 8 bits of each 16-bit 
value in the 64-bit floating-point register Fp[rs1] by the corresponding fixed-point 
signed 16-bit integer in the 64-bit floating-point register Fp[rs2]. Each 24-bit product 
is sign-extended to 32 bits. The most significant (“upper”) 16 bits of the sign- 
extended value are rounded to nearest and then stored in the corresponding 16-bit 
field of the 64-bit floating-point destination register Fp[rd]. If the result is exactly 
halfway between two integers, the result is rounded toward positive infinity. 
FIGURE 7-14 illustrates the operation; CODE EXAMPLE 7-1 exemplifies the operation. 
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FMUL (partitioned) 








Fpirs1] ; 
i 48 47 40 39 | 32 31 24 23 
Fpirs2] 
x sign-extended, x sign-extended, x sign-extended, x sign-extended, 
MS16b MS16b MS16b MS16b 
Fpird] 





63 48 47 32 31 16 15 0 
FIGURE 7-14 FMULS8ULx16 Operation 


CODE EXAMPLE 7-1 16-bit x 16-bit 16-bit Multiply 


fmul8sux16 
fmul8ulx16 


fpadd16 





7.30.6 FMULD8SUx16 Instruction 


FMULD8SUx16 multiplies the most significant ("upper") 8 bits of each 16-bit signed 
value in F[rs1] by the corresponding signed 16-bit fixed-point value in F[rs2]. Each 
24-bit product is shifted left by 8 bits to generate a 32-bit result, which is then stored 
in the 64-bit floating-point register specified by rd. FIGURE 7-15 illustrates the 



































operation. 

Fsirs1] | | | 1 
31 V4 23 1615 | 87 0 

Fs[rs2] 
31 16 15 0 

x x 

Ford] 00000000 ee 00000000 
63 40 39 32 31 8 7 0 


FIGURE 7-15 FMULD8SUx16 Operation 
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FMUL (partitioned) 
7.30.7 FMULD8ULx16 Instruction 


FMULDSULx16 multiplies the unsigned least significant ("lower") 8 bits of each 16- 
bit value in F[rs1] by the corresponding 16-bit fixed-point signed integer in F[rs2]. 
Each 24-bit product is sign-extended to 32 bits and stored in the corresponding half 
of the 64-bit floating-point register specified by rd. FIGURE 7-16 illustrates the 
operation; CODE EXAMPLE 7-2 exemplifies the operation. 


Fslrs1] RES ee 

















31 24 23 16 15 8 7 0 
Fs[rs2] | | | 
31 i 16 15 " 0 
x sign-extended x sign-extended 
Fplrdi — ve 
63 32 31 0 


FIGURE 7-16 FMULD8ULx16 Operation 


CODE EXAMPLE 7-2 16-bit x 16-bit 32-bit Multiply 


fmuld8sux16 
fmuld8ulx16 





fpadd32 
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FMUL«s|d|q» 





7.931 Floating-Point Multiply 


Instruction op3 opf Operation Assembly Language Syntax Class” 
FMULs 11 0100 001001001 Multiply Single fmuls  fregrsir freSrsor fregra A1 
FMUId 110100 001001010 Multiply Double fmuld  fregrsir freSrsor fegra A1 
FMULq 11 0100 00100 1011 Multiply Quad fmulq  fregrsir fregrsor fegra C3 
FsMULd 110100 001101001 Multiply Single to Double  fsmuld freg;sir freSrsor fregyq A1 
FdMUIq 110100 00110 1110 Multiply Double to Quad f£dmulq fregrsir freSrsor fregyq C3 





31 30 29 25 24 19 18 14 13 5 4 0 


Description The floating-point multiply instructions multiply the contents of the floating-point 
register(s) specified by the rs1 field by the contents of the floating-point register(s) 
specified by the rs2 field. The instructions then write the product into the floating- 
point register(s) specified by the rd field. 


The FsMULd instruction provides the exact double-precision product of two single- 
precision operands, without underflow, overflow, or rounding error. Similarly, 
FdMULQ provides the exact quad-precision product of two double-precision 
operands. 


Rounding is performed as specified by FSR.rd. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FMULq or FAMULgq instruction 
causes an illegal instruction exception, allowing privileged 
software to emulate the instruction. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute any FMUL instruction causes an fp disabled exception. 


If the FPU is enabled, an attempt to execute an FMULq or FdMULQ instruction 
causes an fp. exception other (with FSR.ftt = unimplemented_FPop), since that 
instruction is not implemented in hardware in UltraSPARC Architecture 2005 
implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 
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FMUL«s|d|q» 


Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented FPop (FMULq, FdMULq only)) 
fp exception other (FSR.ftt = unfinished FPop) 
fp exception ieee 754 (any: NV; FMUL«sldlq» only: OF, UF, NX) 
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FNEG 





7.02 


Floating-Point Negate 


Instruction  op3 opf Operation Assembly Language Syntax Class 
FNEGs 11 0100 0 0000 0101 Negate Single fnegs  fregrsor fregyq A1 
FNEGd 11 0100 0 0000 0110 Negate Double fnegd  fregys2r fregrd A1 
FNEGq 11 0100 0 0000 0111 Negate Quad fnegq  fregrsor fregrd C3 





31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 5 4 0 


FNEG copies the source floating-point register(s) to the destination floating-point 
register(s), with the sign bit complemented. 


These instructions clear (set to 0) both FSR.cexc and FSR.ftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FNEGq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FNEG instruction when instruction bits 18:14 are nonzero 
causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FNEG instruction causes an fp disabled exception. 


illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented FPop (FNEGq only)) 
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FPACK 





7.33 FPACK 


Instruction opf Operation si s2 d Assembly Language Syntax Class 

FPACK16 000111011 Four 16-bit packs into 8 — f64 f32 fpack16 fregrsor fregrg C3 
unsigned bits 

FPACK32 000111010 Two 32-bit packs into 8 f64 f64 f64 fpack32 fregrsir fregrso, fregrd C3 
unsigned bits 


FPACKFIX 000111101 Four 16-bit packs into 16 —  f64 f32 fpackfix fregrsor fregrg C3 
signed bits 





31 30 29 25 24 19 18 14 13 5 4 0 


Description The FPACK instructions convert multiple values in a source register to a lower- 
precision fixed or pixel format and stores the resulting values in the destination 
register. Input values are clipped to the dynamic range of the output format. Packing 
applies a scale factor from GSR.scale to allow flexible positioning of the binary 
point. See the subsections on following pages for more detailed descriptions of the 
operations of these instructions. 


In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an illegal instruction exception, and are emulated 
in software. 


An attempt to execute an FPACK16 or FPACKFIX instruction when rs1 + 0 causes an 
illegal instruction exception. 


Exceptions illegal instruction 


See Also FEXPAND on page 186 
FPMERGE on page 221 
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7.00 


FPACK 
FPACK16 


FPACK16 takes four 16-bit fixed values from the 64-bit floating-point register 
Fp[rs2], scales, truncates, and clips them into four 8-bit unsigned integers, and stores 
the results in the 32-bit destination register, Fg[rd]. FIGURE 7-17 illustrates the 
FPACK16 operation. 


Fp[rs2] 











GSR.scale | «0100 














4 0 
Folrs2] (16 bits) | 
0000 

19 16 15 7 3 0 





FIGURE 7-17 FPACK16 Operation 


Note | FPACK16 ignores the most significant bit of GSR.scale 
(GSR.scale{4}). 


This operation is carried out as follows: 


1. Left-shift the value from Fp[rs2] by the number of bits specified in GSR.scale 
while maintaining clipping information. 


2. Truncate and clip to an 8-bit unsigned integer starting at the bit immediately to 
the left of the implicit binary point (that is, between bits 7 and 6 for each 16-bit 
word). Truncation converts the scaled value into a signed integer (that is, round 
toward negative infinity). If the resulting value is negative (that is, its most 
significant bit is set), 0 is returned as the clipped value. If the value is greater than 
255, then 255 is delivered as the clipped value. Otherwise, the scaled value is 
returned as the result. 


3. Store the result in the corresponding byte in the 32-bit destination register, Fg[rd]. 


For each 16-bit partition, the sequence of operations performed is shown in the 
following example pseudo-code: 


tmp < source_operand{15:0} << GSR.scale; 
// Pick off the bits from bit position 15+GSR.scale to 
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// bit position 7 from the shifted result 
trunc_signed_value+ tmp{(15+GSR.scale) : 7}; 
If (trunc signed value < 0) 
unsigned_8bit_result 0; 
else if (trunc signed value > 255) 
unsigned 8bit result ¢ 255; 
else 
unsigned 8bit result + trunc signed value(14:7); 


7199.2. FPACK32 


FPACK32 takes two 32-bit fixed values from the second source operand (64-bit 
floating-point register Fp[rs2]) and scales, truncates, and clips them into two 8-bit 
unsigned integers. The two 8-bit integers are merged at the corresponding least 
significant byte positions of each 32-bit word in the 64-bit floating-point register 
Fp[rs1], left-shifted by 8 bits. The 64-bit result is stored in Fp[rd]. Thus, successive 
FPACK32 instructions can assemble two pixels by using three or four pairs of 32-bit 
fixed values. FIGURE 7-18 illustrates the FPACK32 operation. 





Fpirs2] | 


SESE 


E 56 g | 47 40 L 32 E 24 8 16 


GSR.scale 


4 0 




















Fpirs2] (32 bits) 








000000 | 
37 31 30 0 












implicit binary point 


Fpird] 





FIGURE 7-18 FPACK32 Operation 


This operation, illustrated in FIGURE 7-18, is carried out as follows: 


1. Left-shift each 32-bit value in Fp[rs2] by the number of bits specified in 
GSR.scale, while maintaining clipping information. 
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FPACK 


2. For each 32-bit value, truncate and clip to an 8-bit unsigned integer starting at the 
bit immediately to the left of the implicit binary point (that is, between bits 23 and 
22 for each 32-bit word). Truncation is performed to convert the scaled value into 
a signed integer (that is, round toward negative infinity). If the resulting value is 
negative (that is, the most significant bit is 1), then 0 is returned as the clipped 
value. If the value is greater than 255, then 255 is delivered as the clipped value. 
Otherwise, the scaled value is returned as the result. 


3. Left-shift each 32-bit value from Fp[rs1] by 8 bits. 


4. Merge the two clipped 8-bit unsigned values into the corresponding least 
significant byte positions in the left-shifted Fp[rs2] value. 


5. Store the result in the 64-bit destination register Fp[rd]. 


For each 32-bit partition, the sequence of operations performed is shown in the 
following pseudo-code: 


tmp «— source operand2(31:0) << GSR.scale; 
// Pick off the bits from bit position 31+GSR.scale to 
// bit position 23 from the shifted result 
trunc, signed, value + tmp{ (31+GSR.scale) : 23}; 
if (trunc signed value « 0) 
unsigned 8bit value — 0; 
else if (trunc signed value » 255) 
unsigned 8bit value — 255; 
else 
unsigned 8bit value — trunc signed value(30:23); 
Final, 32bit Result €— (source operand1(31:0]) << 8) | 
(unsigned 8bit value(7:0)); 
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FPACK 
7.33.3 FPACKFIX 


FPACKFIX takes two 32-bit fixed values from the 64-bit floating-point register 
Fp[rs2], scales, truncates, and clips them into two 16-bit unsigned integers, and then 
stores the result in the 32-bit destination register Fg[rd]. FIGURE 7-19 illustrates the 
FPACKFIX operation. 
































Fpirs2] 
63 32 31 0 
Fsird] 
31 16 15 0 
GSR.scale |00110 
4 0 
Fpirs2] (32 bits) 
000000 
37 32 31 16415 6 5 0 
implicit binary point 
Fsird] (16 bits) 











FIGURE 7-19 FPACKFIX Operation 


This operation is carried out as follows: 


1. Left-shift each 32-bit value from Fp[rs2]) by the number of bits specified in 
GSR.scale, while maintaining clipping information. 


2. For each 32-bit value, truncate and clip to a 16-bit unsigned integer starting at the 
bit immediately to the left of the implicit binary point (that is, between bits 16 and 
15 for each 32-bit word). Truncation is performed to convert the scaled value into 
a signed integer (that is, round toward negative infinity). If the resulting value is 
less than —32768, then —32768 is returned as the clipped value. If the value is 
greater than 32767, then 32767 is delivered as the clipped value. Otherwise, the 
scaled value is returned as the result. 


3. Store the result in the 32-bit destination register Fs[rd]. 


For each 32-bit partition, the sequence of operations performed is shown in the 
following pseudo-code: 

tmp «— source operand(31:0) << GSR.scale; 

// Pick off the bits from bit position 31+GSR.scale to 

// bit position 16 from the shifted result 
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trunc signed value — tmp{ (31+GSR.scale) :16); 
if (trunc signed value « -32768) 
signed l6bit result — -32768; 
else if (trunc signed value » 32767) 
signed 16bit result + 32767; 
else 
signed 16bit result €— trunc signed _ value{31:16}; 
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FPADD 





7.34 Fixed-point Partitioned Add[vis1 | 


instruction opf Operation si  s2 d Assembly Language Syntax Class 
FPADD16S 0 0101 0001 Two 16-bit adds f32 f32 f32  fpadd16s fregrsı fregrs2, frega A1 
FPADD32 00101 0010 Two 32-bit adds f64 f64 f64  fpadd32 fregrsı fregrsæ frega A1 
FPADD32S 00101 0011 One 32-bit add f32 f32 132  fpadd32s freg,ss, fregrs2, frega A1 


31 30 29 25 24 19 18 14 18 5 4 0 


Description | FPADD16 (FPADD32) performs four 16-bit (two 32-bit) partitioned additions 
between the corresponding fixed-point values contained in the source operands 
(Fplrst], Fp[rs2]). The result is placed in the destination register, Fp[rd]. 


The 32-bit versions of these instructions (FPADD16S and FPADD325) perform two 
16-bit or one 32-bit partitioned additions. 


Any carry out from each addition is discarded and a 2's-complement arithmetic 
result is produced. 





63 48 47 32 31 16 15 0 


FIGURE 7-20 FPADD16 Operation 
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FPADD 


Fpirs1] 





63 32 31 0 


FIGURE 7-21 FPADD32 Operation 





FsIrd] (sum) 


31 16 15 0 


FIGURE 7-22 FPADD16S Operation 


FsIrs1] 
31 0 
Fsirs2] 
31 0 
+ 
31 0 


FIGURE 7-23 FPADD32S Operation 


CHAPTER 7 * Instructions 219 


FPADD 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FPADD instruction causes an fp disabled exception. 


Exceptions fp disabled 
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7.35  FPMERGE 


Instruction opf Operation si s2 d Assembly Language Syntax Class 


FPMERGE 00100 1011 Two 32-bit merges f32 f32 £164  fpmerge fregrsir fregrs2r fregra C3 





31 30 29 25 24 19 18 14 18 5 4 0 


Description FPMERGE interleaves eight 8-bit unsigned values in Fsfrs1] and Fsfrs2] to produce 
a 64-bit value in the destination register Fp[rd]. This instruction converts from 
packed to planar representation when it is applied twice in succession; for example, 
R1G1B1A1,R3G3B3A3 — R1R3G1G3A1A3 — RIR2R3R4G1G2G3G4. 


FPMERGE also converts from planar to packed when it is applied twice in 
succession; for example, RIR2R3R4,B1B2B3B4 — R1B1R2B2R3B3R4B4 — 
R1G1B1A1R2G2B2A2. 


FIGURE 7-24 illustrates the operation. 


Fpird] 








63 5655 48 47 4039 32 31 24 23 16 15 8 7 0 
FIGURE 7-24 FPMERGE Operation 


Sad R1 Gl Bl Al R2 G2 B2 A2 i 
$d2 R3 G3 B3 A3 R4 G4 BA AA] packed representation 


F2, $d4 !rl R3 Gl G3 Bl B3 Al A3,. m 
t3, $d6 !r2 R4 G2 G4 B2 B4 A2 a4) intermediate 


fomerge $f4, f6, dO !rl R2 R3 R4 Gl G2 G3 G4 l . 
Fomerge &f5, %f7, %d2 !B1 B2 B3 B4 Al A2 A3 A4} planar representation 


F2, $d4 rl Bl R2 B2 R3 B3 R4 B4,. 
F3, $d6 !G1 Al G2 A2 G3 A3 G4 A4]intermediate 


t6, %d0 !R1 Gl Bl Al R2 G2 B2 A2 
t7, %d2 !R3 G3 B3 A3 R4 G4 B4 A4} packed representation 








fpmerge %£0, 





fomerge $f0, 
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CODE EXAMPLE 7-3 FPMERGE 


In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an illegal instruction exception, and are emulated 
in software. 


Exceptions illegal instruction 


See Also FPACK on page 212 
FEXPAND on page 186 
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7.36  Fixed-point Partitioned Subtract (64-bit) 


VIS 1 

















Instruction opf Operation si s2 d Assembly Language Syntax Class 
FPSUB16 0 0101 0100 Four 16-bit subtracts f64 f64 f64 fpsub16  fregrsy, fregrsa, fregra A1 
FPSUB16S 00101 0101 Two 16-bit subtracts f32 f32 f32 fpsublés fregrsy, freg;so, fregra A1 
FPSUB32 0 0101 0110 Two 32-bit subtracts f64  f64 f64  fpsub32  fregrsy, freg;so, frega A1 
FPSUB32S 00101 0111 One 32-bit subtract (32 32. f32 fpsub32s fregrsy, freg;so, fregra A1 








31 30 29 25 24 19 18 14 13 5 4 0 


Description | FPSUB16 (FPSUB32) performs four 16-bit (two 32-bit) partitioned subtractions 
between the corresponding fixed-point values contained in the source operands 
(Fp[rs1], Fp[rs2]). The values in Fp[rs2] are subtracted from those in Fp[rs1], and 
the result is placed in the destination register, Fp[rd]. 


The 32-bit versions of these instructions (FPSUB165 and FPSUB32S) perform two 16- 
bit or one 32-bit partitioned subtractions. 


Any carry out from each subtraction is discarded and a 2’s-complement arithmetic 
result is produced. 





Fplrd] 
(difference) 


63 48 47 32 31 16 15 0 


FIGURE 7-25 FPSUB16 Operation 
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FPSUB 


Fpirs1]1 


Fplrd] 





FsIrd] 
(difference) 


31 16 15 0 


FIGURE 7-27 FPSUB16S Operation 


Fsirs1] 

31 0 
Fsirs2] 

31 0 
FsIrd] 
(difference) 

31 0 


FIGURE 7-28 FPSUB32S Operation 
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If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FPSUB instruction causes an fp disabled exception. 


Exceptions fp disabled 
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7.37 FRegister Logical Operate (1 operand) 


Instruction opf Operation Assembly Language Syntax Class 
FZERO 001100000 Zerofil ^^ fzo fega M 
FZEROs 001100001 Zero fill, 32-bit fzeros  freg; A1 
FONE 001111110 One fill fone ÎreSrd A1 
FONEs 001111111 One fill, 32-bit fones fregra Al 





31 30 29 25 24 19 18 14 13 5 4 0 
Description FZERO and FONE fill the 64-bit destination register, Fp[rd], with all ‘0’ bits or all ‘1’ 
bits (respectively). 


FZEROs and FONES fill the 32-bit destination register, Fp[rd], with all ‘0’ bits or all 
‘T bits (respectively. 


An attempt to execute an FZERO or FONE instruction when instruction bits 18:14 or 
bits 4:0 are nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FZERO[s] or FONE[s] instruction causes an fp disabled 
exception. 


Exceptions illegal instruction 
fo disabled 


See Also F Register 2-operand Logical Operations on page 227 
F Register 3-operand Logical Operations on page 229 
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F Register 2-operand Logical Ops 





7.38 


Instruction opf Operation Assembly Language Syntax Class 
FSRC1 001110100 Copy Fp[rs1] to Fp[rd] fsrcl fregrs1, fregra A1 
FSRC1s 001110101 Copy Fg[rs1] to Fg[rd], 32-bit fsrcls  fregrsir freSrg A1 
FSRC2 001111000 Copy Fp[rs2] to Fp[rd] fsrc2 fre21s2, fregrd A1 
FSRC2s 001111001 Copy Fsfrs2] to Fsfrd], 32-bit fsrc2s  fregrsor fregrd A1 
FNOT1 001101010 Negate (1’s complement) Fp[rs1] fnotl fregrs1, frega A1 
FNOT1s 00110 1011  Negate (1’s complement) Fg[rs1], 32-bit fnotls  freg;sir fregrg Al 
FNOT2 001100110 Negate (1’s complement) Fp[rs2] fnot2 fregrs2, fregra Al 
FNOT2s 001100111  Negate (1’s complement) Fe[rs2], 32-bit fnot2s  freg;s2, fregra Al 
10 rd 110110 rst opf l^ cm di 
10 rd 110110 NECI opf rs2 
31 30 29 25 24 19 18 14 13 5 4 0 
Description The standard 64-bit versions of these instructions perform one of four 64-bit logical 
operations on the 64-bit floating-point register Fp[rs1] (or Fp[rs2]) and store the 
result in the 64-bit floating-point destination register Fp[rd]. 
The 32-bit (single-precision) versions of these instructions perform 32-bit logical 
operations on Fg[rs1] (or Fs[rs2]) and store the result in Fg[rd]. 
An attempt to execute an FSRC1(s) or FNOT1(s) instruction when instruction bits 4:0 
are nonzero causes an illegal instruction exception. An attempt to execute an 
FSRC2(s) or FNOT2(s) instruction when instruction bits 18:14 are nonzero causes an 
illegal instruction exception. 
If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FSRC1[s], FNOT1[s], FSRC1[s], or FNOT1[s] instruction causes 
an fp disabled exception. 
Programming | FSRC1s (FSRC1) functions similarly to FMOVs (FMOVAd), except 
Note | that FSRCIs (FSRC1) does not modify the FSR register while 
FMOVs (FMOVd) update some fields of FSR (see Floating-Point 
Move on page 193). Programmers are encouraged to use FMOVs 
(FMOVd) instead of FSRC1s (FSRC1) whenever practical. 
Exceptions illegal_instruction 


F Register Logical Operate (2 operand) 


fo disabled 
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F Register 2-operand Logical Ops 


See Also Floating-Point Move on page 193 
F Register 1-operand Logical Operations on page 226 
F Register 3-operand Logical Operations on page 229 
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F Register 3-operand Logical Ops 





1:99 


F Register Logical Operate (3 operand) 








Instruction opf Operation Assembly Language Syntax Class 
FOR 001111100 Logical or freSrs1r fre81s2, fregrd A1 
FORs 001111101 Logical or, 32-bit ÎreSrs1r freSrsar fregrd Al 
FNOR 001100010 Logical nor fregrsir freSrsor fregrd A1 
FNORs 00110 0011 Logical nor, 32-bit fnors ÎreSrs1r freSrsor fregrd Al 
FAND 0 0111 0000 Logical and ÎreSrs1r freSrsor fregrd Al 
FANDs 00111 0001 Logical and, 32-bit fands fregrsir freSrsar fregrd A1 
FNAND 00110 1110 Logical nand fnand freSrs1r fregrsor fregrd A1 
FNANDs 001101111 Logical nand, 32-bit fnands freSrs1r fregrsor fregrd A1 
FXOR 001101100 Logical xor ÎreSrs1r freSrsor fregrd Al 
FXORs 001101101 Logical xor, 32-bit fxors fregrs1r freSrsar fregra Al 
FXNOR 001110010 Logical xnor fxnor fregistr f'egrs2, fregra A1 
FXNORs 001110011 Logical xnor, 32-bit fxnors freg1s1, fregrs2r freSrag Al 
FORNOT1 001111010 (not F[rs1]) or F[rs2] fornotl freSistr fregrso, freSrg Al 
FORNOTIs 001111011 (not F[rs1]) or F[rs2], 32-bit fornotls  fregysr fleSrsor fregrd A1 
FORNOT2 001110110  F[rs1] or (not F[rs2]) fornot2 ÎreSrstr fregrs2r fregra A1 
FORNOT2s 001110111  F[rs1] or (not F[rs2]), 32-bit fornot2s  fregrsr fleSrsor fregrd A1 
FANDNOTI 001101000 (not F[rs1]) and F[rs2] fandnotl  fregu, fregrs2r fregyq A1 
FANDNOTIs 001101001 (not F[rs1]) and F[rs2], 32-bit fandnotis fregrsir freSrso, fregyq Al 
FANDNOT2 001100100 F[rs1] and (not F[rs2]) fandnot2 fregrsir fregrs2r fregyq Al 
FANDNOT2s 001100101  F[rs1] and (not F[rs2]), 32-bit fandnot2s fregrsir freSrso, fregyq A1 
10 rd rs opf rs2 
31 30 29 25 24 19 18 14 13 5 4 0 
Description The standard 64-bit versions of these instructions perform one of ten 64-bit logical 
operations between the 64-bit floating-point registers Fp[rs1] and Fp[rs2]. The result 
is stored in the 64-bit floating-point destination register Fp[rd]. 
The 32-bit (single-precision) versions of these instructions perform 32-bit logical 
operations between Fg[rs1] and Fsfrs2], storing the result in Fs[rd]. 
If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute any 3-operand F Register Logical Operate instruction causes an 
fp disabled exception. 
Exceptions fp disabled 
See Also F Register 1-operand Logical Operations on page 226 


F Register 2-operand Logical Operations on page 227 
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FSQRT<s|d|q> Instructions 





7.40 Floating-Point Square Root 





Instruction op3 opt Operation Assembly Language Syntax Class - 
FSQRTS 110100 000101001 Square Root Single  fsqrts frega, freq M 
FSQRTd 11 0100 0 0010 1010 Square Root Double fsqrtd fregrgo, fregrd A1 
FSORTq 11 0100 0 0010 1011 Square Root Quad fsqrtq fregrgo, fregrd C3 

31 30 29 25 24 19 18 14 13 5 4 0 


Description These SPARC V9 instructions generate the square root of the floating-point operand 
in the floating-point register(s) specified by the rs2 field and place the result in the 
destination floating-point register(s) specified by the rd field. Rounding is performed 
as specified by FSR.rd. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FSQRTq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FSQRT instruction when instruction bits 18:14 are nonzero 
causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FSORT instruction causes an fp. disabled exception. 


If the FPU is enabled, an fp exception other (with FSR.ftt = unimplemented_FPop) 
exception occurs, since the FSORT instructions are not implemented in hardware in 
UItraSPARC Architecture 2005 implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented_FPop (FSORT is not implemented 
in hardware)) 
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F«s|d|q» TOi 





7.41 


Convert Floating-Point to Integer 


Instruction opf Operation si s2 d Assembly Language Syntax Class 
FsTOx | 010000001 Convert Single to 64-bit Integer — £182 £164 fstox fregrso, frega Al 
FdTOx 010000010 Convert Double to 64-bit Integer — f64 f64 fdtox fregrso, freggg Al 
FqTOx 010000011 Convert Quad to 64-bit Integer — f128 f64 fatox fregrsor fregrg C3 
FsTOi 011010001 Convert Single to 32-bit Integer — £82 182 fstoi fregrsor fregrg A1 
FdTOi 011010010 Convert Double to 32-bit Integer — f64 132 fdtoi fregrso, freggg A 
FqTOi 011010011 Convert Quad to 32-bit Integer — f128 f32 fqtoi fregrsor fregrg C3 











Description 


Spe ONO ee 1. eee 
0 29 7 9 18 7 4 0 


FsTOx, FdTOx, and FqTOx convert the floating-point operand in the floating-point 
register(s) specified by rs2 to a 64-bit integer in the floating-point register Fp[rd]. 


FsTOi, FdTOi, and FqTOi convert the floating-point operand in the floating-point 
register(s) specified by rs2 to a 32-bit integer in the floating-point register Fs[rd]. 


The result is always rounded toward zero; that is, the rounding direction (rd) field of 
the FSR register is ignored. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FqTOx or FqTOi instruction 
causes an illegal_instruction exception, allowing privileged 
software to emulate the instruction. 


An attempt to execute an F<s |d | q» TO«ilx» instruction when instruction bits 18:14 
are nonzero causes an /llegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an F«sldlq»TO«ilx» instruction causes an fp disabled 
exception. 


If the FPU is enabled, FqTOi and FqTOx cause fp exception other (with FSR.ftt = 
unimplemented FPop), since those instructions are not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


If the floating-point operand's value is too large to be converted to an integer of the 
specified size or is a NaN or infinity, then an fp exception ieee 754 "invalid" 
exception occurs. The value written into the floating-point register(s) specified by rd 
in these cases is as defined in Integer Overflow Definition on page 389. 
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F«s|d|q» TOi 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


Exceptions illegal instruction 
fo disabled 


fp exception other (FSR.ftt = unimplemented FPop (FqTOx, FqTOi only)) 
fp exception ieee 754 (NV, NX) 
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F<s|d|q>TO<s|d|q> 





7.42 Convert Between Floating-Point Formats 


Instruction op3 opt Operation si s2 d Assembly Language Syntax Class 
FsTOd 110100 011001001 Convert Single to Double — f32 f64 fstod  freg;go, fregrq A1 
FsTOq 110100 011001101 Convert Single to Quad — f32 f128 fstoq  freg;go, fregrd C3 
FdTOs 110100 011000110 Convert Double to Single — f64 (32  fdtos  freg;go, fregrq | AM 
FdTOq 110100 011001110 Convert Double to Quad —  f64 128 fdtoq fregrs2r frega | C3 
FqTOs 110100 011000111 Convert Quad to Single —  f128 f32  fqtos  freg;go, freer | C3 
FqTOd 110100 011001011 Convert Quad to Double —  f128 f64  fqtod  freg;so, frega | C3 





DE = z E 
31 30 29 25 24 19 18 14 13 5 4 


Description These instructions convert the floating-point operand in the floating-point register(s) 
specified by rs2 to a floating-point number in the destination format. They write the 
result into the floating-point register(s) specified by rd. 


The value of FSR.rd determines how rounding is performed by these instructions. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FsTOq, FdTOq, FqTOs, or 
FqTOd instruction causes an illegal instruction exception, allowing 
privileged software to emulate the instruction. 


An attempt to execute an F«sldlq»TO«sldlq» instruction when instruction bits 
18:14 are nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an F<s |d |q>TO<s |d | q> instruction causes an fp disabled 
exception. 


If the FPU is enabled, FsTOq, FdTOq, FqTOs, and FqTOd cause fp exception other 
(with FSR.ftt = unimplemented_FPop), since those instructions are not implemented 
in hardware in UltraSPARC Architecture 2005 implementations. 


FqTOd, FqTOs, and FdTOs (the "narrowing" conversion instructions) can cause 
fp exception ieee 754 OF, UF, and NX exceptions. FdTOq, FsTOq, and FsTOd (the 
“widening” conversion instructions) cannot. 


Any of these six instructions can trigger an fp. exception ieee 754 NV exception if 
the source operand is a signalling NaN. 
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Note | For FdTOs and FsTOd, an fp exception other with 
FSR.ftt = unfinished FPop can occur if implementation-dependent 
conditions are detected during the conversion operation. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented_FPop (FsTOq, FqTOs, FdTOq, 
and FqTOd only)) 
fp exception other (FSR.ftt = unfinished FPop) 
fp exception ieee 754 (NV) 
fp exception ieee 754 (OF, UF, NX (FqTOd, FqTOs, and FdTOs)) 
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FSUB 





7.43 


Floating-Point Subtract 


Instruction op3 opf Operation Assembly Language Syntax Class” 
FSUBs 11 0100 0 0100 0101 Subtract Single fsubs  fregysir fregrs2r freSrd A1 
FSUBd 11 0100 0 0100 0110 Subtract Double fsubd  fregysir freSrsor fregra A1 
FSUBq 11 0100 0 0100 0111 Subtract Quad fsubq  fregrsir fregrs2r fregra C3 





0 


31 30 29 


Description 


Exceptions 


25 24 19 18 14 18 5 4 


The floating-point subtract instructions subtract the floating-point register(s) 
specified by the rs2 field from the floating-point register(s) specified by the rs1 field. 
The instructions then write the difference into the floating-point register(s) specified 
by the rd field. 


Rounding is performed as specified by FSR.rd. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FSUBq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FSUB instruction causes an fp disabled exception. 


If the FPU is enabled, FSUBq causes an fp exception other (with FSR.ftt = 
unimplemented, FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


Note | An fp exception other with FSR.ftt = unfinished FPop can occur 
if the operation detects unusual, implementation-specific 
conditions (for FSUBs or FSUBd). 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 

fo disabled 

fp exception other (FSR.ftt = unimplemented FPop (FSUBq)) 
fp exception other (FSR.ftt = unfinished FPop) 

fo exception ieee 754 (OF, UF, NX, NV) 
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FxTO(<s|d|q> 





7.44 Convert 64-bit Integer to Floating Point 


Assembly Language 





Instruction op3 opf Operation s1 s2 d Syntax Class 

FxTOs 110100 010000100 Convert 64-bit Integer to — i64 f32 fxtos fregrsor fregrd A1 
Single 

FxTOd 110100 010001000 Convert 64-bit Integer to — i64 f64  fxtod fregrsor fregrd A1 
Double 

FxTOq 110100 010001100 Convert 64-bit Integer to — i64 f128 fxtoq fregrsor fregrd C3 
Quad 

10 rd op3 — opf rs2 
31 30 29 25 24 19 18 14 13 5 4 0 


Description FxTOs, FxTOd, and FxTOq convert the 64-bit signed integer operand in the floating- 
point register Fp[rs2] into a floating-point number in the destination format. 


All write their result into the floating-point register(s) specified by rd. 
The value of FSR.rd determines how rounding is performed by FxTOs and FxTOd. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FxTOq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FxTO«sdlq» instruction when instruction bits 18:14 are 
nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FxTO«sldlq» instruction causes an fp disabled exception. 


If the FPU is enabled, FXTOq causes an fp exception other (with FSR.ftt = 
unimplemented, FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented_FPop (FxTOq only)) 
fp exception ieee 754 (NX (FxTOs and FxTOd only)) 
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ILLTRAP 





7.45 Illegal Instruction Trap 


Instruction op op2 Operation Assembly Language Syntax Class 


ILLTRAP 00 000 illegal instruction trap illtrap  const22 A1 





oo CE NN NN 


31 30 29 25 24 22 21 0 


Description The ILLTRAP instruction causes an illegal instruction exception. The const22 value 
in the instruction is ignored by the virtual processor; specifically, this field is not 
reserved by the architecture for any future use. 


V9 Compatibility | Except for its name, this instruction is identical to the SPARC V8 
Note | UNIMP instruction. 


An attempt to execute an ILLTRAP instruction when reserved instruction bits 29:25 
are nonzero (also) causes an illegal instruction exception. However, software should 
not rely on this behavior, because a future version of the architecture may use 
nonzero values of bits 29:25 to encode other functions. 


Exceptions illegal instruction 
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IMPDEP 





7.46 


Implementation-Dependent Instructions 


Instruction  op3 —  opà Operation — — ass 
IMPDEP1 11 0110 (any) Implementation-Dependent Instruction 1 N3 
IMPDEP2A 11 0111 0 Implementation-Dependent Instruction 2A N3 
IMPDEP2B 110111 1,2,3 Implementation-Dependent Instruction 2B N3 





impl. dep. impl. dep. op4 impl. dep. 


31 30 29 


Description 


Exceptions 


7.46.1 


25 24 19 18 7 6 5 4 0 


IMPL. DEP. #106-V9: The IMPDEP2A opcode space is completely implementation 
dependent. Implementation-dependent aspects of IMPDEP2A instructions include 
their operation, the interpretation of bits 29-25, 18-7, and 4-0 in their encodings, 
and which (if any) exceptions they may cause. 


IMPDEP2B opcodes are reserved; see IMDEP2B Opcodes on page 239. 


See "Implementation-Dependent and Reserved Opcodes" in the "Extending the 
UltraSPARC Architecture" section of the separate document UltraSPARC Architecture 
Application Notes, for information about extending the instruction set by means of 
implementation-dependent instructions. 


Compatibility | IMPDEP2A and IMPDEP2B are subsets of the SPARC V9 
Note | IMPDEP2 opcode space. The IMPDEP1 opcode space from 
SPARC V9 is occupied by various VIS instructions in the 
UltraSPARC Architecture, so it should not be used for 
implementation-dependent instructions. 


implementation-dependent (IMPDEP2A, IMPDEP2B) 














IMPDEP1 Opcodes [VIS 1,2 


All operands of instructions using IMPDEP1 opcodes are in floating-point registers, 
unless otherwise specified. Pixel values are stored in single-precision floating point 
registers and fixed values are stored in double-precision floating point registers, 
unless otherwise specified. 


Note | All IMPDEP1 instructions, regardless of whether they use 
floating-point registers or integer registers, leave FSR.cexc and 
FSR.aexc unchanged. 
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7.46.2 


IMPDEP 
7.46.1.1 Opcode Formats 


Most of the VIS instruction set maps to the opcode space reserved for the 
Implementation-Dependent Instruction 1 (op3 = IMPDEP1 = 3645) instructions. 


IMDEP2B Opcodes 


No instructions are currently encoded in the IMPDEP2B opcode space; it is a 
reserved opcode space. 
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INVALW 





7.47 Mark Register Window Sets as "Invalid" 





Instruction Operation Assembly Language Syntax Class 

INVALWP Mark all register window sets as "invalid" invalw A1 
-oomr[ 0 TS 
31 30 29 25 24 19 18 0 


Description The INVALW instruction marks all register window sets as "invalid"; specifically, it 
atomically performs the following operations: 


CANSAVE < (N REG WINDOWS — 2) 
CANRESTORE < 0 
OTHERWIN < 0 


Programming | INVALW marks all windows as invalid; after executing INVALW, 

Notes | N_REG_WINDOWS-2 SAVEs can be performed without generating a 
spill trap. This instruction allows window manipulations to be 
atomic, without the value of N_REG_WINDOWS being visible to 
privileged software and without an assumption that 
N_REG_WINDOWS is constant (since hyperprivileged software can 
migrate a thread among virtual processors, across which 
N_REG_WINDOWS may vary). 





In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an illegal_instruction exception, and are emulated 
in software. 


Exceptions illegal_instruction (not implemented in hardware in UltraSPARC Architecture 2005) 


See Also ALLCLEAN on page 150 
NORMALW on page 289 
OTHERW on page 291 
RESTORED on page 311 
SAVED on page 319 
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JMPL 





7.48 


Jump and Link 


Instruction op3 Operation Assembly Language Syntax Class 
JMPL 11 1000 Jump and Link jmpl address, reg; A1 





CS 2 e eT 


31 30 29 


Description 


Exceptions 


See Also 


25 24 19 18 14 13 12 5 4 0 


The JMPL instruction causes a register-indirect delayed control transfer to the 
address given by "R[rs1] + R[rs2]" if i field = 0, or "R[rs1] + sign. ext (simm13)" if 
i=1. 


The JMPL instruction copies the PC, which contains the address of the JMPL 
instruction, into register R[rd]. 


An attempt to execute a JMPL instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If either of the low-order two bits of the jump address is nonzero, a 
mem adaress not aligned exception occurs. 


Programming | A JMPL instruction with rd = 15 functions as a register-indirect 
Notes | call using the standard link register. 


JMPL with rd = 0 can be used to return from a subroutine. The 
typical return address is "r[31] + 8" if a nonleaf routine (one that 
uses the SAVE instruction) is entered by a CALL instruction, or 
“R[15] + 8" if a leaf routine (one that does not use the SAVE 
instruction) is entered by a CALL instruction or by a JMPL 
instruction with rd = 15. 





When PSTATE.am = 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system or being written 
into R[rd]. (closed impl. dep. #125-V9-Cs10) 


illegal instruction 
mem address not aligned 


CALL on page 164 
Bicc on page 156 
BPCC on page 162 
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LD 





7.49 Load Integer 








Instruction  op3 Operation Assembly Language Syntax ^ Class - 
LDSB 00 1001 Load Signed Byte ldsb address] , regra A1 
LDSH 00 1010 Load Signed Halfword ldsh address] , regra A1 
LDSW 00 1000 Load Signed Word ldsw address], regra A1 
LDUB 00 0001 Load Unsigned Byte ldub address], regra A1 
LDUH 00 0010 Load Unsigned Halfword lduh address], regra A1 
LDUW 00 0000 Load Unsigned Word lduwt address] , regra A1 
LDX 00 1011 Load Extended Word ldx address], regra A1 


t synonym: 1a 


T IAN MEER NES RN. 


208 NE 
31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The load integer instructions copy a byte, a halfword, a word, or an extended word 
from memory. All copy the fetched value into R[rd]. A fetched byte, halfword, or 
word is right-justified in the destination register R[rd]; it is either sign-extended or 
zero-filled on the left, depending on whether the opcode specifies a signed or 
unsigned operation, respectively. 


Load integer instructions access memory using the implicit ASI (see page 104). The 
effective address is "R[rs1]  R[rs2]" if i = 0, or "R[rs1] + sign ext (simm13)" if i = 1. 


A successful load (notably, load extended) instruction operates atomically. 


An attempt to execute a load integer instruction when i = 0 and instruction bits 12:5 
are nonzero causes an illegal instruction exception. 


If the effective address is not halfword-aligned, an attempt to execute an LDUH or 
LDSH causes a mem adaress not aligned exception. If the effective address is not 
word-aligned, an attempt to execute an LDUW or LDSW instruction causes a 

mem address not aligned exception. If the effective address is not doubleword- 
aligned, an attempt to execute an LDX instruction causes a 

mem address not aligned exception. 


V8 Compatibility | The SPARC V8 LD instruction was renamed LDUW in the SPARC 
Note | V9 architecture. The LDSW instruction was new in the SPARC V9 
architecture. 


A load integer twin word (LDTW) instruction exists, but is deprecated; see Load 
Integer Twin Word on page 265 for details. 
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Exceptions illegal instruction 
mem address not aligned (all except LDSB, LDUB) 
VA watchpoint 
data access exception 
fast data access MMU miss 
data access MMU miss 
data access MMU error 
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LDA 





7.50 


Load Integer from Alternate Space 





Instruction op3 Operation Assembly Language Syntax Class 
LDSBAP^s 011001 Load Signed Byte from Alternate ldsba regaddr] imm asi, reg,q A1 
Space ldsba reg plus imm] $asi, reg; 
LDSHAP^s 011010 Load Signed Halfword from Alternate 1dsha regaddr] imm asi, regrg A1 
Space ldsha reg plus imm] $asi, reg; 
LDSWAPAS 011000 Load Signed Word from Alternate ldswa regaddr] imm asi, reg, A1 
Space ldswa reg plus imm] $asi, reg 
LDUBAP^s 010001 Load Unsigned Byte from Alternate  1duba regaddr] imm asi, reg,q A1 
Space lduba reg plus imm] $asi, reg; 
LDUHAPA 010010 Load Unsigned Halfword from lduha regaddr] imm asi, reg, A1 
Alternate Space lduha reg plus imm] $asi, reg; 
LDUWAP^ 010000 Load Unsigned Word from Alternate 1duwat  [regaddr] imm asi, reg, A1 
Space lduwa reg plus imm] $asi, reg; 
LDXAPas! 011011 Load Extended Word from Alternate 1dxa regaddr] imm asi, regrg Al 
Space ldxa reg plus imm] Sasi, reg; 











t synonym: lda 


mp ee nn 


31 30 29 


Description 


25 24 19 18 14 13 12 

The load integer from alternate space instructions copy a byte, a halfword, a word, 
or an extended word from memory. All copy the fetched value into R[rd]. A fetched 
byte, halfword, or word is right-justified in the destination register R[rd]; it is either 
sign-extended or zero-filled on the left, depending on whether the opcode specifies a 
signed or unsigned operation, respectively. 


The load integer from alternate space instructions contain the address space 
identifier (ASI) to be used for the load in the imm asi field if i = 0, or in the ASI 
register if i = 1. The access is privileged if bit 7 of the ASI is 0; otherwise, it is not 
privileged. The effective address for these instructions is "R[rs1] + R[rs2]" if i = 0, or 
“R[rs1] + sign ext (simm13)" if i = 1. 


A successful load (notably, load extended) instruction operates atomically. 


A load integer twin word from alternate space (LDTWA) instruction exists, but is 
deprecated; see Load Integer Twin Word from Alternate Space on page 267 for details. 


An attempt to execute a load integer from alternate space instruction when i = 0 and 
instruction bits 12:5 are nonzero causes an illegal_instruction exception. 
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If the effective address is not halfword-aligned, an attempt to execute an LDUHA or 
LDSHA instruction causes a mem address not aligned exception. If the effective 
address is not word-aligned, an attempt to execute an LDUWA or LDSWA 
instruction causes a mem address not aligned exception. If the effective address is 
not doubleword-aligned, an attempt to execute an LDXA instruction causes a 

mem address not aligned exception. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0, these instructions cause a privileged action exception. In privileged mode 
(PSTATE.priv = 1 and HPSTATE.hpriv = 0), if the ASI is in the range 3046 to 7F16, 
these instructions cause a privileged action exception. 


LDSBA, LDSHA, LDSWA, LDUBA, LDUHA, and LDUWA can be used with any 
of the following ASIs, subject to the privilege mode rules described for the 
privileged action exception above. Use of any other ASI with these instructions 
causes a data access exception xception. 


ASIs valid for LDSBA, LDSHA, LDSWA, LDUBA, LDUHA, and LDUWA 
ASI AS IF PRIV PRIMARY ASI AS IF PRIV PRIMARY LITTLE 
ASI AS IF PRIV SECONDARY ASI AS IF PRIV SECONDARY LITTLE 














ASI NUCLEUS ASI NUCLEUS, LITTLE 
ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI, AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

ASI REAL IO ASI REAL IO LITTLE 

ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 

ASI PRIMARY NO FAULT ASI PRIMARY NO FAULT LITTLE 








ASI SECONDARY NO  FAULT ASI SECONDARY NO FAULT LITTLE 


LDXA can be used with any ASI (including, but not limited to, the above list), unless 
it either (a) violates the privilege mode rules described for the privileged action 
exception above or (b) is used with any of the following ASIs, which causes a 

data access exceplion exception. 


ASIs invalid for LDXA (cause data access exception exception) 























2416 (aliased to 2716, ASI, TWINX N) 2C,g (aliased to 2F,5, ASI, TWINX NL) 

2216 (ASI TWINX AIUP) 2A16 (ASI, TWINX AIUP L) 

2316 (ASI, TWINX AIUS) 2B,g (ASI, TWINX AIUS L) 

26:64 (ASI, TWINX REAL) 2E16 (ASI, TWINX REAL L) 

271g (ASI TWINX N) 2F16 (ASI, TWINX NL) 

ASI BLOCK AS IF USER PRIMARY ASI BLOCK AS IF USER PRIMARY LITTLE 
ASI BLOCK AS IF USER SECONDARY ASI BLOCK AS IF USER SECONDARY LITTLE 
ASI PST8, PRIMARY ASI PST8 PRIMARY LITTLE 

ASI PST8, SECONDARY ASI PST8. SECONDARY LITTLE 

ASI PST16 PRIMARY ASI PST16 PRIMARY LITTLE 
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ASIs invalid for LDXA (cause data access exception exception) 



































ASI PST16 SECONDARY ASI PST16 SECONDARY LITTLE 
ASI PST32, PRIMARY ASI PST32 PRIMARY LITTLE 
ASI PST32, SECONDARY ASI PST32 SECONDARY LITTLE 
ASI FL8 PRIMARY ASI FL8 PRIMARY LITTLE 

ASI FL8. SECONDARY ASI FL8, SECONDARY LITTLE 
ASI FL16 PRIMARY ASI FL16 PRIMARY LITTLE 
ASI FL16 SECONDARY ASI FL16 SECONDARY LITTLE 
ASI BLOCK COMMIT PRIMARY ASI BLOCK COMMIT SECONDARY 
E216 (ASI TWINX P) EAj;g (ASI TWINX PL) 

E316 (ASI TWINX, S) EB,g (ASI TWINX SL) 

ASI BLOCK PRIMARY ASI BLOCK PRIMARY LITTLE 
ASI BLOCK SECONDARY ASI BLOCK SECONDARY LITTLE 





Exceptions mem address not aligned (all except LDSBA and LDUBA) 
privileged action 
VA watchpoint 
data access exception 
fast data access MMU miss 
data access MMU miss 
data access MMU error 





See Also LD on page 242 
STA on page 332 
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7.51 Block Load 


The LDBLOCKF instructions are deprecated and should not be used in new 
software. A sequence of LDX instructions should be used instead. 
The LDBLOCKT instruction is intended to be a processor-specific instruction, 


which may or may not be implemented in future UltraSPARC Architecture 
implementations. Therefore, it should only be used in platform-specific 


dynamically-linked libraries, in hyperprivileged software, or in software created 
by a runtime code generator that is aware of the specific virtual processor 
implementation on which it is executing. 


























ASI 
Instruc-tion Value Operation Assembly Language Syntax Class 
sen I IEEE 
LDBLOCKF” 16%, 64-byte block load from primary address 1dda [regaddr] fASI BLK AIUP, fregrg D2 
space, user privilege ldda [reg plus imm] %asi, fregrg 
LDBLOCKFP 17416 64-byte block load from secondary ldda [regaddr] fASI BLK AIUS, fregrg D2 
address space, user privilege ldda [reg plus imm] %asi, frega 
LDBLOCKFP 1E; 64-byte block load from primary address ldda [regaddr] #AST_BLK_AIUPL, freg,q D2 
space, little-endian, user privilege ldda [reg plus imm] %asi, freg, 
LDBLOCKFP 1F36 64-byte block load from secondary ldda [regaddr] fASI BLK AIUSL, freg D2 
address space, little-endian, user privilegeldda [reg plus imm] ‘Sasi, fregrg 
LDBLOCKFP F0: 64-byte block load from primary address 1dda [regaddr] 4AS1 BLK P, fregyg D2 
space ldda [reg plus imm] %asi, frega 
LDBLOCKF” Fi; 64-byte block load from secondary ldda [regaddr] fASI BLK S, fregrg D2 
address space ldda [reg plus imm] %asi, freg;g 
LDBLOCKFP F8 6 64-byte block load from primary address 1dda [regaddr] £ASI BLK PL, freg, D2 
space, little-endian ldda [reg plus imm] %asi, frega 
LDBLOCKFP r9 & 64-byte block load from secondary ldda [regaddr] fASI BLK SL, fregrg D2 
address space, little-endian ldda [reg plus imm] %asi, freg;g 
11 rd 110011 rs] 1=0 imm_asi rs2 
rd 110011 rei I-1 simm 13 
31 30 29 25 24 19 18 14 13 5 4 0 


Description A block load (LDBLOCKEF) instruction uses one of several special block-transfer 
ASIs. Block transfer ASIs allow block loads to be performed accessing the same 
address space as normal loads. Little-endian ASIs (those with an ‘L’ suffix) access 
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data in little-endian format; otherwise, the access is assumed to be big-endian. Byte 
swapping is performed separately for each of the eight 64-bit (double-precision) F 
registers used by the instruction. 


A block load instruction loads 64 bytes of data from a 64-byte aligned memory area 
into the eight double-precision floating-point registers specified by rd. The lowest- 
addressed eight bytes in memory are loaded into the lowest-numbered 64-bit 
(double-precision) destination F register. 


A block load only guarantees atomicity for each 64-bit (8-byte) portion of the 64 
bytes it accesses. 


The block load instruction is intended to support fast block-copy operations. 


Programming | LDBLOCKT is intended to be a processor-specific instruction 

Note | (see the warning at the top of page 247). If LDBLOCKF must be 
used in software intended to be portable across current and 
previous processor implementations, then it must be coded to 
work in the face of any implementation variation that is 
permitted by implementation dependency 7410-910, described 
below. 


IMPL. DEP. #410-S10: The following aspects of the behavior of block load 

(LDBLOCKF) instructions are implementation dependent: 

a What memory ordering model is used by LDBLOCKF (LDBLOCKF is not 
required to follow TSO memory ordering) 

m Whether LDBLOCKF follows memory ordering with respect to stores (including 
block stores), including whether the virtual processor detects read-after-write and 
write-after-read hazards to overlapping addresses 

m Whether LDBLOCKF appears to execute out of order, or follow LoadLoad 
ordering (with respect to older loads, younger loads, and other LDBLOCKFs) 

m Whether LDBLOCKF follows register-dependency interlocks, as do ordinary load 
instructions 

m Whether LDBLOCKFs to non-cacheable locations are (a) strictly ordered, (b) not 
strictly ordered and cause an illegal instruction exception, or (c) not strictly 
ordered and silently execute without causing an exception (option (c) is strongly 
discouraged) 

m Whether VA watchpoint exceptions are recognized on accesses to all 64 bytes of a 
LDBLOCKF (the recommended behavior), or only on the first eight bytes 

m Whether the MMU ignores the side-effect bit (TTE.e) for LDBLOCKF accesses 
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Programming | If ordering with respect to earlier stores is important (for 

Note | example, a block load that overlaps a previous store) and read- 
after-write hazards are not detected, there must be a MEMBAR 
#StoreLoad instruction between earlier stores and a block 
load. 


If ordering with respect to later stores is important, there must 
be a MEMBAR #LoadStore instruction between a block load 
and subsequent stores. 


If LoadLoad ordering with respect to older or younger loads or 
other block load instructions is important and is not provided 
by an implementation, an intervening MEMBAR #LoadLoad is 
required. 





For further restrictions on the behavior of the block load instruction, see 
implementation-specific processor documentation. 


Implementation | In all UltraSPARC Architecture implementations, the MMU 
Note | ignores the side-effect bit (TTE.e) for LDBLOCKF accesses (impl. 
dep. #410-S10). 


Exceptions. An illegal instruction exception occurs if LDBLOCKF's floating-point 
destination registers are not aligned on an eight-double-precision register boundary. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an LDBLOCKF instruction causes an fp disabled exception. 


If the least significant 6 bits of the effective memory address in an LDBLOCKF 
instruction are nonzero, a mem adaress not aligned exception occurs. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0 (ASIs 1646, 1716, 1E46, and 1F36), LDBLOCKF causes a privileged action 
exception. 


An access caused by LDBLOCKF may trigger a VA watchpoint exception (impl. dep. 
3410-510). 


Implementation | LDBLOCKF shares an opcode with LDDFA and LDSHORTFE; it 
Note | is distinguished by the ASI used. 


illegal instruction 

fo disabled 

mem address not aligned 
privileged action 

VA watchpoint (impl. dep. 1410-510) 
data access exception 

fast data access MMU miss 
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data access MMU miss 
data access MMU error 


See Also STBLOCKF on page 335 
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7.52 Load Floating-Point Register 


Instruction op3 rd Operation Assembly Language Syntax Class - 
LDF 100000 0-31 Load Floating-Point Register la [address], frega M 
LDDF 10 0011 t Load Double Floating-Point Register ldd [address], freg rg A1 
LDQF 10 0010 t Load Quad Floating-Point Register ldq [address], freg rq C3 





t Encoded floating-point register value, as described on page 51. 


DE CE EE 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The load single floating-point instruction (LDF) copies a word from memory into 32- 
bit floating-point destination register Fg [rd]. 


The load doubleword floating-point instruction (LDDF) copies a word-aligned 
doubleword from memory into a 64-bit floating-point destination register, Fp [rd]. 
The unit of atomicity for LDDF is 4 bytes (one word). 


The load quad floating-point instruction (LDOF) copies a word-aligned quadword 
from memory into a 128-bit floating-point destination register, Fo [rd]. The unit of 
atomicity for LDOF is 4 bytes (one word). 


These load floating-point instructions access memory using the implicit ASI (see 
page 104). 


If i = 0, the effective address for these instructions is “R[rs1] + R[rs2]" and if i = 0, 
the effective address is "R[rs1] + sign ext (simm13)". 
Exceptions. An attempt to execute an LDF, LDDF, or LDOF instruction when i = 0 


and instruction bits 12:5 are nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an LDF, LDDF, or LDQF instruction causes an fp disabled 
exception. 


If the effective address is not word-aligned, an attempt to execute an LDF instruction 
causes a mem address not aligned exception. 
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LDDF requires only word alignment. However, if the effective address is word- 
aligned but not doubleword-aligned, an attempt to execute an LDDF instruction 
causes an LDDF mem adaress not aligned exception. In this case, trap handler 
software must emulate the LDDF instruction and return (impl. dep. #109-V9- 
Cs10(a)). 


LDOF requires only word alignment. However, if the effective address is word- 
aligned but not quadword-aligned, an attempt to execute an LDOF instruction 
causes an LDQF mem address not aligned exception. In this case, trap handler 
software must emulate the LDQF instruction and return (impl. dep. #111-V9- 
Cs10(a)). 


Programming | Some compilers issued sequences of single-precision loads for 
Note | SPARC V8 processor targets when the compiler could not 
determine whether doubleword or quadword operands were 
properly aligned. For SPARC V9 processors, since emulation of 
misaligned loads is expected to be fast, compilers should issue 
sets of single-precision loads only when they can determine that 
doubleword or quadword operands are not properly aligned. 





An attempt to execute an LDOF instruction when rd{1} #0 causes an 
fp exception other (FSR.ftt = invalid fp register) exception. 


Implementation | Since UltraSPARC Architecture 2005 processors do not implement 
Note | in hardware instructions (including LDQF) that refer to quad- 
precision floating-point registers, the 
LDQF mem address not aligned and fp exception other (with 
FSR ftt = invalid fp register) exceptions do not occur in 
hardware. However, their effects must be emulated by software 
when the instruction causes an illegal instruction exception and 
subsequent trap. 





Destination Register(s) when Exception Occurs. If a load floating-point 
instruction generates an exception that causes a precise trap, the destination floating- 
point register(s) remain unchanged. 


IMPL. DEP. #44-V8-Cs10(a)(1): If a load floating-point instruction generates an 
exception that causes a non-precise trap, the contents of the destination floating-point 
register(s) remain unchanged or are undefined. 


Exceptions ^ illegal instruction 
fo disabled 
LDDF mem address not aligned 
mem address not aligned 
fp exception other (FSR.ftt = invalid fp register (LDOF only)) 
VA watchpoint 
data access exception 
fast data access MMU miss 
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data access MMU miss 
data access MMU error 


See Also Load Floating-Point from Alternate Space on page 254 
Load Floating-Point State Register (Lower) on page 258 
Store Floating-Point on page 339 
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7.53 Load Floating-Point from Alternate 
Space 





Instruction op3 rd Operation Assembly Language Syntax Class 
LDFA Past 110000 0-31 Load Floating-Point Register lda [regaddr] imm asi, fregrg A1 
from Alternate Space lda [reg plus imm] $asi, fregrg 
LDDFAPs 110011 À Load Double Floating-Point ldda  [regaddr] imm asi, fregrg A1 
Register from Alternate Space 1dda [reg plus imm] $asi, fregrg 
LDQFAPast 110010 + Load Quad Floating-Point ldqa  [regaddr] imm asi, fregrg C3 
[ 


Register from Alternate Space  1dqa reg plus imm] $asi, fregrg 





t Encoded floating-point register value, as described in Floating-Point Register Number Encoding on page 51. 


op3 re i=0 imm_asi rs2 
om 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The load single floating-point from alternate space instruction (LDFA) copies a word 
from memory into 32-bit floating-point destination register Fs[rd]. 


The load double floating-point from alternate space instruction (LDDFA) copies a 
word-aligned doubleword from memory into a 64-bit floating-point destination 
register, Fp [rd]. The unit of atomicity for LDDFA is 4 bytes (one word). 


The load quad floating-point from alternate space instruction (LDOFA) copies a 
word-aligned quadword from memory into a 128-bit floating-point destination 
register, Fo[rd]. The unit of atomicity for LDQFA is 4 bytes (one word). 


If i 2 0, these instructions contain the address space identifier (ASI) to be used for the 
load in the imm asi field and the effective address for the instruction is 

^R[rs1] + R[rs2]". If i = 1, the ASI to be used is contained in the ASI register and the 
effective address for the instruction is "R[rs1] + sign ext (simm13)". 


Exceptions. If the FPU is not enabled (FPRS.fef = 0 or PSTATE pef = 0) or if no 
FPU is present, an attempt to execute an LDFA, LDDFA, or LDQFA instruction 
causes an fp. disabled exception. 


LDFA causes a mem address not aligned exception if the effective memory address 
is not word-aligned. 


V9 Compatibility | LDFA, LDDFA, and LDQFA cause a privileged action exception if 
Note | PSTATE.priv = 0 and bit 7 of the ASI is 0. 
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LDDFA requires only word alignment. However, if the effective address is word- 
aligned but not doubleword-aligned, LDDFA causes an 

LDDF mem address not aligned exception. In this case, trap handler software 
must emulate the LDDFA instruction and return (impl. dep. #109-V9-Cs10(b)). 


LDQFA requires only word alignment. However, if the effective address is word- 
aligned but not quadword-aligned, LDQFA causes an 

LDQF mem adaress not aligned exception. In this case, trap handler software 
must emulate the LDOFA instruction and return (impl. dep. #111-V9-Cs10(b)). 


An attempt to execute an LDOFA instruction when rd{1} # 0 causes an 
fp exception other (with FSR.ftt = invalid fp register) exception. 


Implementation | Since UltraSPARC Architecture 2005 processors do not implement 
Note | in hardware instructions (including LDQFA) that refer to quad- 
precision floating-point registers, the 
LDQF mem address not aligned and fp exception other (with 
FSR.ftt = invalid. fp. register) exceptions do not occur in 
hardware. However, their effects must be emulated by software 
when the instruction causes an illegal instruction exception and 
subsequent trap. 





Programming | Some compilers issued sequences of single-precision loads for 
Note | SPARC V8 processor targets when the compiler could not 
determine whether doubleword or quadword operands were 
properly aligned. For SPARC V9 processors, since emulation of 
misaligned loads is expected to be fast, compilers should issue 
sets of single-precision loads only when they can determine that 
doubleword or quadword operands are not properly aligned. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0, this instruction causes a privileged action exception. In privileged mode 
(PSTATE.priv = 1 and HPSTATE.hpriv = 0), if the ASI is in the range 3046 to 7F16, this 
instruction causes a privileged action exception. 





LDFA and LDQFA can be used with any of the following ASIs, subject to the 
privilege mode rules described for the privileged action exception above. Use of any 
other ASI with these instructions causes a data access exception exception. 


ASis valid for LDFA and LDOFA 
ASI AS IF PRIV PRIMARY ASI AS IF PRIV PRIMARY LITTLE 
ASI, AS IF PRIV ASI AS IF PRIV SECONDARY LITTLE 


ASI NUCLEUS ASI NUCLEUS LITTLE 
ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

ASI REAL IO ASI REAL IO LITTLE 
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ASls valid for LDFA and LDOFA 





ASI PRIMARY ASI PRIMARY LITTLE 
ASI SECONDARY ASI SECONDARY LITTLE 
ASI PRIMARY NO FAULT ASI PRIMARY NO FAULT LITTLE 


ASI SECONDARY NO FAULT ASI SECONDARY NO FAULT LITTLE 


LDDFA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged action exception above. Use of any other ASI with 
the LDDFA instruction causes a data access exceplion exception. 





ASls valid for LDDFA 

















ASI NUCLEUS ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

ASI REAL IO ASI REAL IO LITTLE 

ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 

ASI PRIMARY NO FAULT ASI PRIMARY NO FAULT LITTLE 

ASI SECONDARY NO FAULT ASI SECONDARY NO, FAULT LITTLE 








Behavior with Partial Store ASIs. ASIs C016-C516 and C846-CD4g are only 
defined for use in Partial Store operations (see page 347). None of them should be 
used with LDDFA; however, if any of those ASIs is used with LDDFA, the LDDFA 
behaves as follows: 


1. IMPL. DEP. #257-U3: If an LDDFA opcode is used with an ASI of C0165-C546 or 
C816-CD16 (Partial Store ASIs, which are an illegal combination with LDDFA) and 
a memory address is specified with less than 8-byte alignment, the virtual 
processor generates an exception. It is implementation dependent whether the 
generated exception is a data access exception, mem address not aligned, or 
LDDF mem address not aligned exception. 


2. If the memory address is correctly aligned, the virtual processor generates a 
data access exception. 


Destination Register(s) when Exception Occurs. If a load floating-point 
alternate instruction generates an exception that causes a precise trap, the 
destination floating-point register(s) remain unchanged. 


IMPL. DEP. #44-V8-Cs10(b): If a load floating-point alternate instruction generates 
an exception that causes a non-precise trap, it is implementation dependent whether 
the contents of the destination floating-point register(s) are undefined or are 
guaranteed to remain unchanged. 
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Implementation | LDDFA shares an opcode with the LDBLOCKF and LDSHORTF 
Note | instructions; it is distinguished by the ASI used. 


illegal instruction 

fo disabled 

LDDF mem address not aligned 

mem address not aligned 

fp exception other (FSR.ftt = invalid fp register (LDOFA only)) 
privileged action 

VA watchpoint 

fast data access MMU miss 

data access MMU miss 

data access MMU error 





Load Floating-Point Register on page 251 

Block Load on page 247 

Store Short Floating-Point on page 350 

Store Floating-Point into Alternate Space on page 341 
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7.54 


Load Floating-Point State Register 
(Lower) 


The LDFSR instruction is deprecated and should not be used in new software. 
The LDXESR instruction should be used instead. 





Opcode op3 


rd Operation Assembly Language Syntax Class 





LDFSR 100001 0 Load Floating-Point State Register (Lower) 1d [address], sfsr D2 
100001 1-31 (see page 273) 


z SM — g 


31 30 29 


Description 


e 


25 24 19 18 14 13 12 5 4 


The Load Floating-point State Register (Lower) instruction (LDFSR) waits for all 
FPop instructions that have not finished execution to complete and then loads a 
word from memory into the less significant 32 bits of the FSR. The more-significant 
32 bits of FSR are unaffected by LDFSR. LDFSR does not alter the ver, ftt, qne, 
reserved, or unimplemented (for example, ns) fields of FSR (see page 61). 


Programming 
Note 


For future compatibility, software should only issue an LDFSR 
instruction with a zero value (or a value previously read from 
the same field) in any reserved field of FSR. 





LDFSR accesses memory using the implicit ASI (see page 122). 


An attempt to execute an LDFSR instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an LDFSR instruction causes an fp disabled exception. 


LDFSR causes a mem adaress not aligned exception if the effective memory 
address is not word-aligned. 


V8 Compatibility | The SPARC V9 architecture supports two different instructions 
Note | to load the FSR: the (deprecated) SPARC V8 LDFSR instruction 
is defined to load only the less-significant 32 bits of the FSR, 
whereas LDXFSR allows SPARC V9 programs to load all 64 bits 
of the FSR. 


258 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


Exceptions 


See Also 


LDFSR (Deprecated) 


Implementation | LDFSR shares an opcode with the LDXFSR instruction (and 
Note | possibly with other implementation-dependent instructions); 
they are differentiated by the instruction rd field. An attempt to 
execute the op = 115, op3 = 10 0001; opcode with an invalid rd 
value causes an illegal instruction exception. 


illegal instruction 

fo disabled 

mem address not aligned 
VA watchpoint 

fast data access MMU miss 
data access MMU miss 
data access MMU error 





Load Floating-Point Register on page 251 
Load Floating-Point State Register on page 273 
Store Floating-Point on page 339 
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7.59 


Short Floating-Point Load 























ASI 
Instruction Value Operation Assembly Language Syntax Class 
LDSHORTF D046  8-bit load from primary address space ldda regaddr] #ASI_FL8_P, freg;q C3 
ldda reg plus imm] $asi, freg,q 
LDSHORTF  Dl;g  8-bitload from secondary address ldda regaddr] #ASI_FL8_S, fregrg C3 
space ldda reg plus imm] %asi, fregrg 
LDSHORTF D846  8-bit load from primary address space, 1dda regaddr] *ASI FL8 PL, fregrg C3 
little-endian ldda reg plus imm) $asi, fregrg 
LDSHORTF  D9;6  8-bit load from secondary address space, 1dda regaddr] *ASI FLB8. SL, fregrg C3 
little-endian ldda reg plus imm) $asi, fregrg 
LDSHORTF D2j¢ 16-bit load from primary address space 1dda regaddr] ASI FL16 P, fregrg C3 
ldda reg_plus_imm] Sasi, fregrg 
LDSHORTF D346 16-bit load from secondary address ldda regaddr] #ASI_FL16_S, fregrg C3 
space ldda reg plus imm] %asi, fregrg 
LDSHORTF  DaA,g 16-bit load from primary address space, 1dda regaddr] &ASI FL16 PL, freggg C3 
little-endian ldda reg plus imm] %asi, fregrg 
LDSHORTF DByg 16-bit load from secondary address ldda regaddr] #ASI_FL16_SL, freg C3 
space, little-endian ldda reg plus imm] %asi, fregrg 











Wa T T mx I € 


31 30 29 


Description 


25 24 19 18 14 18 5 4 0 


Short floating-point load instructions allow an 8- or 16-bit value to be loaded from 
memory into a 64-bit floating-point register. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an LDSHORTF instruction causes an fp disabled exception. 


An 8-bit load places the loaded value in the least significant byte of Fp[rd] and 
zeroes in the most-significant three bytes of Fp[rd]. An 8-bit LDSHORTF can be 
performed from an arbitrary byte address. 


A 16-bit load places the loaded value in the least significant halfword of Fp[rd] and 
zeroes in the more-significant halfword of Fp[rd]. A 16-bit LDSHORTF from an 
address that is not halfword-aligned (an odd address) causes a 

mem address not aligned exception. 
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Exceptions 


LDSHORTF 


Little-endian ASIs transfer data in little-endian format from memory; otherwise, 
memory is assumed to be in big-endian byte order. 


LDSHORTF is typically used with the FALIGNDATA instruction 
(see Align Address on page 149) to assemble or store 64 bits from 
noncontiguous components. 


LDSHORTF shares an opcode with the LDBLOCKF and LDDFA 
instructions; it is distinguished by the ASI used. 


Programming 
Note 





Implementation 
Note 





In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause a data access exception exception, and are 
emulated in software. 


VA watchpoint 

data access exception 

fast data access MMU miss 
data access MMU miss 
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7.56 Load-Store Unsigned Byte 


Instruction op3 Operation Assembly Language Syntax Class 


LDSTUB 00 1101 Load-Store Unsigned Byte ldstub [address], reg, A1 





11 rd op3 rs rs2 
is =< = p Oe eS Se 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The load-store unsigned byte instruction copies a byte from memory into R[rd], then 
rewrites the addressed byte in memory to all 1's. The fetched byte is right-justified in 
the destination register R[rd] and zero-filled on the left. 


The operation is performed atomically, that is, without allowing intervening 
interrupts or deferred traps. In a multiprocessor system, two or more virtual 
processors executing LDSTUB, LDSTUBA, CASA, CASXA, SWAP, or SWAPA 
instructions addressing all or parts of the same doubleword simultaneously are 
guaranteed to execute them in an undefined, but serial, order. 


LDSTUB accesses memory using the implicit ASI (see page 104). The effective 
address for this instruction is "R[rs1] + R[rs2]" if i = 0, or 
“R[rs1] + sign ext (simm13)" if i = 1. 


The coherence and atomicity of memory operations between virtual processors and 
1/0 DMA memory accesses are implementation dependent (impl. dep. #120-V9). 


An attempt to execute an LDSTUB instruction when i = 0 and instruction bits 12:5 
are nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 
VA watchpoint 
data access exception 
fast data access MMU miss 
data access MMU miss 
data access MMU error 
fast data access protection 
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LDSTUBA 





7.57  Load-Store Unsigned Byte to Alternate 
Space 





Instruction op3 Operation Assembly Language Syntax Class 
LDSTUBAP^s 011101  Load-Store Unsigned Byte into 1dstuba [regaddr] imm asi, reg, A1 
Alternate Space ldstuba [reg plus imm] Sasi, regra 


RER ao I 
WP 8L TL .9mw89— — — 


31 30 29 25 24 19 18 14 18 12 5 4 0 


Description The load-store unsigned byte into alternate space instruction copies a byte from 
memory into R[rd], then rewrites the addressed byte in memory to all 1's. The 
fetched byte is right-justified in the destination register R[rd] and zero-filled on the 
left. 


The operation is performed atomically, that is, without allowing intervening 
interrupts or deferred traps. In a multiprocessor system, two or more virtual 
processors executing LDSTUB, LDSTUBA, CASA, CASXA, SWAP, or SWAPA 
instructions addressing all or parts of the same doubleword simultaneously are 
guaranteed to execute them in an undefined, but serial, order. 


If i = 0, LDSTUBA contains the address space identifier (ASI) to be used for the load 
in the imm asi field. If i = 1, the ASI is found in the ASI register. In nonprivileged 
mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI is 0, this 
instruction causes a privileged action exception. In privileged mode 

(PSTATE.priv = 1 and HPSTATE.hpriv = 0), if the ASI is in the range 3046 to 7F46, this 
instruction causes a privileged action exception. 


LDSTUBA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged action exception above. Use of any other ASI with 
this instruction causes a data access exception exception. 





ASIs valid for LDSTUBA 








ASI NUCLEUS ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 


CHAPTER 7 * Instructions 263 


LDSTUBA 


Exceptions privileged_action 
VA_watchpoint 
data access exception 
fast data access MMU miss 
data access MMU miss 
data access MMU error 
fast data access protection 
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LDTW (Deprecated) 





7.58 


Load Integer Twin Word 


The LDTW instruction is deprecated and should not be used in new software. It 


is provided only for compatibility with previous versions of the architecture.The 
LDX instruction should be used instead. 





Instruction op3 Operation Assembly Language Syntax + Class 





LDTWP 000011 Load Integer Twin Word — 1dtw [address], regra D2 


t The original assembly language syntax for this instruction used an “1dd” instruction mnemonic, which is now 
deprecated. Over time, assemblers will support the new “1dtw” mnemonic for this instruction. In the mean- 
time, some existing assemblers may only recognize the original “1dd” mnemonic. 


BHO. ue po ous up ee OR = EE 
LANE NEME UM UE MEN CRE 


31 30 29 


Description 


25 24 19 18 14 13 12 


The load integer twin word instruction (LDTW) copies two words (with doubleword 
alignment) from memory into a pair of R registers. The word at the effective 
memory address is copied into the least significant 32 bits of the even-numbered R 
register. The word at the effective memory address + 4 is copied into the least 
significant 32 bits of the following odd-numbered R register. The most significant 32 
bits of both the even-numbered and odd-numbered R registers are zero-filled. 


Note | Execution of an LDTW instruction with rd = 0 modifies only 
R[1]. 


Load integer twin word instructions access memory using the implicit ASI (see 
page 104). If i = 0, the effective address for these instructions is "R[rs1] + R[rs2]" and 
if i = 0, the effective address is "R[rs1] + sign ext (simm13)". 


With respect to little endian memory, an LDTW instruction behaves as if it comprises 
two 32-bit loads, each of which is byte-swapped independently before being written 
into its respective destination register. 


IMPL. DEP. #107-V9a: It is implementation dependent whether LDTW is 
implemented in hardware. If not, an attempt to execute an LDTW instruction will 
cause an unimplemented LDTW exception. 


Programming | LDTW is provided for compatibility with existing SPARC V8 
Note | software. It may execute slowly on SPARC V9 machines because 
of data path and register-access difficulties. 
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SPARC V9 | LDTW was (inaccurately) named LDD in the SPARC V8 and 
Compatibility | SPARC V9 specifications. It does not load a doubleword; it 
Note | loads two words (into two registers), and has been renamed 
accordingly. 


The least significant bit of the rd field in an LDTW instruction is unused and should 
always be set to 0 by software. An attempt to execute an LDTW instruction that 
refers to a misaligned (odd-numbered) destination register causes an 

illegal instruction exception. 


An attempt to execute an LDTW instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If the effective address is not doubleword-aligned, an attempt to execute an LDTW 
instruction causes a mem address not aligned exception. 


A successful LDTW instruction operates atomically. 


Exceptions unimplemented LDTW 
illegal instruction 
mem address not aligned 
VA watchpoint 
data access exception 
fast data access MMU miss 
data access MMU miss 
data access MMU error 





See Also LDW/LDX on page 242 
STTW on page 352 
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LDTWA (Deprecated) 





7.59 Load Integer Twin Word from Alternate 
opace 


The LDTWA instruction is deprecated and should not be used in new software. 
The LDXA instruction should be used instead. 


Opcode op3 Operation Assembly Language Syntax Class 








LDTWA-'^* 010011 Load Integer Twin Word from Alternate 1dtwa [regaddr] imm asi, regrq D2, Y3+ 
Space ldtwa [reg plus imm] %asi, rega 





t The original assembly language syntax for this instruction used an “1dda” instruction mnemonic, which is now deprecated. Over time, 
assemblers will support the new "1dtwa" mnemonic for this instruction. In the meantime, some assemblers may only recognize the 
original “1dda” mnemonic. 





t Y3 for restricted ASIs (006-7F16); D2 for unrestricted ASIs (8016-FF16) 


s SOM mw Ix 


31 30 29 25 24 19 18 14 13 12 5 4 


eo 


Description The load integer twin word from alternate space instruction (LDTWA) copies two 
32-bit words from memory (with doubleword memory alignment) into a pair of R 
registers. The word at the effective memory address is copied into the least 
significant 32 bits of the even-numbered R register. The word at the effective 
memory address - 4 is copied into the least significant 32 bits of the following odd- 
numbered R register. The most significant 32 bits of both the even-numbered and 
odd-numbered R registers are zero-filled. 


Note | Execution of an LDTWA instruction with rd = 0 modifies only 
R[1]. 


Ifi=0, the LDTWA instruction contains the address space identifier (ASI) to be used 
for the load in its imm asi field and the effective address for the instruction is 
^R[rs1] + R[rs2]". If i = 1, the ASI to be used is contained in the ASI register and the 
effective address for the instruction is "R[rs1] + sign ext (simm13)". 


With respect to little endian memory, an LDTWA instruction behaves as if it is 
composed of two 32-bit loads, each of which is byte-swapped independently before 
being written into its respective destination register. 
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IMPL. DEP. #107-V9b: It is implementation dependent whether LDTWA is 
implemented in hardware. If not, an attempt to execute an LDTWA instruction will 
cause an unimplemented LDTW exception so that it can be emulated. 


Programming | LDTWA is provided for compatibility with existing SPARC V8 
Note | software. It may execute slowly on SPARC V9 machines because 
of data path and register-access difficulties. 


If LDTWA is emulated in software, an LDXA instruction 
instruction should be used for the memory access in the 
emulation code in order to preserve atomicity. 


SPARC V9 | LDTWA was (inaccurately) named LDDA in the SPARC V8 and 
Compatibility | SPARC V9 specifications. 
Note 


The least significant bit of the rd field in an LDTWA instruction is unused and 
should always be set to 0 by software. An attempt to execute an LDTWA instruction 
that refers to a misaligned (odd-numbered) destination register causes an 

illegal instruction exception. 


If the effective address is not doubleword-aligned, an attempt to execute an LDTWA 
instruction causes a mem address not aligned exception. 


A successful LDTWA instruction operates atomically. 


LDTWA causes a mem address not aligned exception if the address is not 
doubleword-aligned. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0, these instructions cause a privileged_action exception. In privileged mode 
(PSTATE.priv = 1 and HPSTATE.hpriv = 0), if the ASI is in the range 3046 to 7F 6, 
these instructions cause a privileged_action exception. 


LDTWA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged action exception above. Use of any other ASI with 
this instruction causes a data access exception exception (impl. dep. #300-U4- 














Cs10). 
ASIs valid for LDTWA 
ASI_NUCLEUS ASI_NUCLEUS_ LITTLE 
ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 
ASI REAL IO ASI REAL IO LITTLE 
22,6 (ASI TWINX AIUP) 2AjgE (ASI TWINX AIUP LI) 
23,6 (ASI TWINX AIUS) 2B;6t (ASI TWINX AIUS L) 
24:6t (aliased to 2746, ASI. TWINX N) 2C,gf(aliased to 2F,,, ASI. TWINX NL) 
26,6 (ASI TWINX REAL) 2E,6t (ASI TWINX REAL L) 
2716 (ASI TWINX N) 2Fi¢f (ASI TWINX NL) 
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Exceptions 


See Also 


LDTWA (Deprecated) 





ASls valid for LDTWA 


ASI_PRIMARY 

ASI SECONDARY 

ASI PRIMARY NO FAULT 
ASI SECONDARY NO FAULT 


E2,¢f (ASI TWINX P) 
E3,cf (ASI TWINX S) 





ASI PRIMARY LITTLE 

ASI SECONDARY LITTLE 

ASI PRIMARY NO FAULT LITTLE 
ASI SECONDARY NO FAULT LITTLE 


EA,¢t (ASI TWINX PL) 
EB16t (ASI TWINX SL) 


i If this ASI is used with the opcode for LDTWA and i = 0, the LDTXA 
instruction is executed instead of LDTWA. For behavior of LDTXA, 
see Load Integer Twin Extended Word from Alternate Space on page 270. 
If this ASI is used with the opcode for LDTWA and i = 1, behavior is 


undefined. 


Programming 
Note 


#300-U4-Cs10). 


Implementation 
Note 





unimplemented LDTW illegal instruction 
mem address not aligned 

privileged action 

VA watchpoint 

data access exception 

fast data access MMU miss 

data access MMU miss 

data access MMU error 





LDWA/LDXA on page 244 
LDTXA on page 270 
STTWA on page 354 


Nontranslating ASIs (see page 421) should only be accessed 
using LDXA (not LDTWA) instructions. If an LDTWA 
referencing a nontranslating ASI is executed, per the above 
table, it generates a dala access exceptionexception (impl. dep. 


The deprecated instruction LDTWA shares an opcode with 
LDTXA. LDTXA is not deprecated and has different address 
alignment requirements than LDTWA. See Load Integer Twin 
Extended Word from Alternate Space on page 270. 
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LDTXA 





7.60 Load Integer Twin Extended Word from 
Alternate Space [vis 2+] 





The LDTXA instructions are not guaranteed to be implemented on all 
UltraSPARC Architecture implementations. Therefore, they should only be 


used in platform-specific dynamically-linked libraries, in hyperprivileged 
software, or in software created by a runtime code generator that is aware of the 
specific virtual processor implementation on which it is executing. 


















































ASI 
Instruction Value Operation Assembly Language Syntax + Class 
LDTXAN 22 6 Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX AIUP, reg; N1 
as if user (nonprivileged), Primary 
address space 
2316 Load Integer Twin Extended Word, | 1dtxa [regaddr] &ASI TWINX AIUS, reg;q N1 
as if user (nonprivileged), Secondary 
address space 
2614 Load Integer Twin Extended Word, | 1dtxa [regaddr] &ASI TWINX REAL, regyg N1 
real address 
2716 Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX N, regyg N1 
nucleus context 
2A4g Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX AIUP. L, reg;g N1 
as if user (nonprivileged), Primary 
address space, little endian 
2Big Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX AIUS. L, reg;g N1 
as if user (nonprivileged), Secondary 
address space, little endian 
2Ejg Load Integer Twin Extended Word, ldtxa [regaddr] &ASI TWINX REAL L, reg; N1 
real address, little endian 
2F4g Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX NL, reg; N1 
nucleus context, little-endian 
LDTXAN E2 6 Load Integer Twin Extended Word, — 1dtxa [regaddr] $ASI TWINX P, regyg N1 
Primary address space 
E346. Load Integer Twin Extended Word, | ldtxa [regaddr] &ASI TWINX S, reg;g N1 
Secondary address space 
EA] Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX PL, regrg N1 
Primary address space, little endian 
EB4g Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX SL, reg; N1 


Secondary address space, little-endian 





+ The original assembly language syntax for these instructions used the “1dda” instruction mnemonic. That syntax is now deprecated. 
Over time, assemblers will support the new "1dt xa" mnemonic for this instruction. In the meantime, some existing assemblers may 
only recognize the original "1dda" mnemonic. 





270 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


31 30 29 


Description 


LDTXA 


rao E 
25 24 


19 18 14 13 12 5 4 0 


ASIs 2616, 2E1g E216 E346, F016, and Fli¢ are used with the LDTXA instruction to 
atomically read a 128-bit data item into a pair of 64-bit R registers (a “twin extended 
word”). The data are placed in an even/odd pair of 64-bit registers. The lowest- 
address 64 bits are placed in the even-numbered register; the highest-address 64 bits 
are placed in the odd-numbered register. 


Note | Execution of an LDTXA instruction with rd = 0 modifies only 
R[1]. 


ASIs E246, E316, F016, and F146 perform an access using a virtual address, while ASIs 
2616 and 2Eg use a real address. 


An LDTXA instruction that performs a little-endian access behaves as if it comprises 
two 64-bit loads (performed atomically), each of which is byte-swapped 
independently before being written into its respective destination register. 


Exceptions. An attempt to execute an LDTXA instruction with an odd-numbered 
destination register (rd{0} = 1) causes an illegal instruction exception. 


An attempt to execute an LDTXA instruction with an effective memory address that 
is not aligned on a 16-byte boundary causes a mem address not aligned exception. 


IMPL. DEP. #413-S10: It is implementation dependent whether VA watchpoint and 
PA watchpoint exceptions are recognized on accesses to all 16 bytes of a LDTXA 
instruction (the recommended behavior) or only on accesses to the first 8 bytes. 


An attempted access by an LDTXA instruction to noncacheable memory causes an a 
data access exception exception (impl. dep. #306-U4-Cs10). 


Programming | A key use for this instruction is to read a full TTE entry (128 bits, 

Note | tag and data) in a TSB directly, without using software 
interlocks. The "real address" variants can perform the access 
using a real address, bypassing the VA-to-RA translation. 


Programming | In hyperprivileged mode, an access to ASI E216, E346, F016, or 
Note | F146 is performed using physical (not virtual) addressing. 





The virtual processor MMU does not provide virtual-to-real translation for ASIs 2646 
and 2E46; the effective address provided with either of those ASIs is interpreted 
directly as a real address. 


Compatibility ASIs 2716, 2F16, 2616, and 2E 16 are now standard ASIs that 
Note | replace (respectively) ASIs 2446, 2C16, 3416, and 3C46 that were 
supported in some previous UltraSPARC implementations. 
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Exceptions 


See Also 


LDTXA 


A mem_address_not_aligned trap is taken if the access is not aligned on a 128-byte 
boundary. 


Implementation | LDTXA shares an opcode with the "i = 0" variant of the 
Note | (deprecated) LDTWA instruction; they are differentiated by the 
combination of the value of "i" and the ASI used in the 
instruction. See Load Integer Twin Word from Alternate Space on 
page 267. 


illegal instruction 

mem address not aligned 
privileged action 

VA watchpoint (impl. dep. #413-S10) 
data access exception 

fast data access MMU miss 

data access MMU miss 

data access MMU error 

PA watchpoint (impl. dep. 4413-510) 
data access error 





LDTWA on page 267 
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LDTXA 





7.61 Load Floating-Point State Register 


Instruction  op3 rd Operation Assembly Language Syntax Class 
10 0001 0 (see page 258) 
LDXFSR 100001 1 Load Floating-Point State Register ldx [address], %fsr A1 


— 10 0001 2-31 Reserved 





209 SE — 2 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description A load floating-point state register instruction (LDXFSR) waits for all FPop 
instructions that have not finished execution to complete and then loads a 
doubleword from memory into the FSR. 


LDXFSR does not alter the ver, ftt, qne, reserved, or unimplemented (for example, 
ns) fields of FSR (see page 61). 


Programming | For future compatibility, software should only issue an LDXFSR 
Note | instruction with a zero value (or a value previously read from 
the same field) written into any reserved field of FSR. 


LDXFSR accesses memory using the implicit ASI (see page 104). 


If i = 0, the effective address for these instructions is "R[rs1] + R[rs2]" and if i = 0, 
the effective address is "R[rs1] + sign ext (simm13)". 


Exceptions. An attempt to execute an instruction encoded as op = 2 and op3 = 2146 
when any of the following conditions exist causes an illegal instruction exception: 


m i=0 and instruction bits 12:5 are nonzero 
m (rd>1) 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, 
an attempt to execute an LDXFSR instruction causes an fp disabled exception. 


If the effective address is not doubleword-aligned, an attempt to execute an LDXFSR 
instruction causes a mem address not aligned exception. 


Destination Register(s) when Exception Occurs. If a load floating-point state 
register instruction generates an exception that causes a precise trap, the destination 
register (FSR) remains unchanged. 
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Exceptions 


See Also 


LDTXA 


IMPL. DEP. #44-V8-Cs10(a)(2): If an LDXFSR instruction generates an exception 
that causes a non-precise trap, it is implementation dependent whether the contents 
of the destination register (FSR) is undefined or is guaranteed to remain unchanged. 


Implementation | LDXFSR shares an opcode with the (deprecated) LDFSR 
Note | instruction (and possibly with other implementation-dependent 
instructions); they are differentiated by the instruction rd field. 
An attempt to execute the op = 115, 0p3 = 10 0001, opcode with 
an invalid rd value causes an illegal instruction exception. 


illegal instruction 

fo disabled 

mem address not aligned 
VA watchpoint 

data access exception 

fast data access MMU miss 
data access MMU miss 
data access MMU error 





Load Floating-Point Register on page 251 
Load Floating-Point State Register (Lower) on page 258 
Store Floating-Point State Register on page 357 
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MEMBAR 





7.62 


Memory Barrier 


Instruction op3 Operation Assembly Language Syntax Class 


MEMBAR 10 1000 Memory Barrier membar membar_mask A1 





EL EE 


31 30 29 


Description 


25 24 19 18 14 13 12 


The memory barrier instruction, MEMBAR, has two complementary functions: to 
express order constraints between memory references and to provide explicit control 
of memory-reference completion. The membar mask field in the suggested assembly 
language is the concatenation of the cmask and mmask instruction fields. 


MEMBAR introduces an order constraint between classes of memory references 
appearing before the MEMBAR and memory references following it in a program. 
The particular classes of memory references are specified by the mmask field. 
Memory references are classified as loads (including load instructions LDSTUB[A], 
SWAP[A], CASA, and CASX[A] and stores (including store instructions LDSTUB[A], 
SWAP[A], CASA, CASXA, and FLUSH). The mmask field specifies the classes of 
memory references subject to ordering, as described below. MEMBAR applies to all 
memory operations in all address spaces referenced by the issuing virtual processor, 
but it has no effect on memory references by other virtual processors. When the 
cmask field is nonzero, completion as well as order constraints are imposed, and the 
order imposed can be more stringent than that specifiable by the mmask field alone. 


A load has been performed when the value loaded has been transmitted from 
memory and cannot be modified by another virtual processor. A store has been 
performed when the value stored has become visible, that is, when the previous 
value can no longer be read by any virtual processor. In specifying the effect of 
MEMBAR, instructions are considered to be executed as if they were processed in a 
strictly sequential fashion, with each instruction completed before the next has 
begun. 


The mmask field is encoded in bits 3 through 0 of the instruction. TABLE 7-7 specifies 
the order constraint that each bit of mmask (selected when set to 1) imposes on 
memory references appearing before and after the MEMBAR. From zero to four 
mask bits may be selected in the mmask field. 
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TABLE 7-7 MEMBAR mmask Encodings 














Assembly 

Mask Bit Language Name Description 

mmask{3} StoreStore The effects of all stores appearing prior to the MEMBAR 
instruction must be visible to all virtual processors before the 
effect of any stores following the MEMBAR. 

mmask{2} LoadStore All loads appearing prior to the MEMBAR instruction must 
have been performed before the effects of any stores following 
the MEMBAR are visible to any other virtual processor. 

mmask{1} StoreLoad The effects of all stores appearing prior to the MEMBAR 
instruction must be visible to all virtual processors before loads 
following the MEMBAR may be performed. 

mmask{0} LoadLoad All loads appearing prior to the MEMBAR instruction must 


have been performed before any loads following the MEMBAR 
may be performed. 


The cmask field is encoded in bits 6 through 4 of the instruction. Bits in the cmask 
field, described in TABLE 7-8, specify additional constraints on the order of memory 
references and the processing of instructions. If cmask is zero, then MEMBAR 
enforces the partial ordering specified by the mmask field; if cmask is nonzero, then 
completion and partial order constraints are applied. 


TABLE 7-86 MEMBAR cmask Encodings 


Mask Bit Function 


cmask{2} Synchronization 
barrier 


cmask{1} Memory issue 
barrier 


cmask{0} Lookaside barrier 


Assembly 
Language Name Description 


#Sync All operations (including nonmemory 
reference operations) appearing prior to the 
MEMBAR must have been performed and 
the effects of any exceptions be visible before 
any instruction after the MEMBAR may be 
initiated. 


#MemIssue All memory reference operations appearing 
prior to the MEMBAR must have been 
performed before any memory operation 
after the MEMBAR may be initiated. 


#Lookaside A store appearing prior to the MEMBAR 
must complete before any load following the 
MEMBAR referencing the same address can 
be initiated. 


A MEMBAR instruction with both mmask = 0 and cmask = 0 is functionally a NOP. 


For information on the use of MEMBAR, see Memory Ordering and Synchronization on 
page 415 and Programming with the Memory Models contained in the separate volume 
UltraSPARC Architecture Application Notes. For additional information about the 
memory models themselves, see Chapter 9, Memory. 
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MEMBAR 


The coherence and atomicity of memory operations between virtual processors and 
1/0 DMA memory accesses are implementation dependent (impl. dep. #120-V9). 


MEMBAR with mmask = 81; and cmask = 01; (MEMBAR 
#StoreStore) is identical in function to the SPARC V8 STBAR 
instruction, which is deprecated. 


V9 Compatibility 
Note 





An attempt to execute a MEMBAR instruction when instruction bits 12:7 are nonzero 
causes an illegal instruction exception. 


Implementation 
Note 


MEMBAR shares an opcode withRDasr; it is distinguished by 
rs1 = 15, rd = 0, i = 1, and bit 12 = 0. 





Memory Synchronization 


The UItraSPARC Architecture provides some level of software control over memory 
synchronization, through use of the MEMBAR and FLUSH instructions for explicit 
control of memory ordering in program execution. 


IMPL. DEP. #412-S10: An UltraSPARC Architecture implementation may define the 
operation of each MEMBAR variant in any manner that provides the required 
semantics. 


Implementation | For an UltraSPARC Architecture virtual processor that only 

Note | provides TSO memory ordering semantics, three of the ordering 
MEMBARs would normally be implemented as NOPs. TABLE 7-9 
shows an acceptable implementation of MEMBAR for a TSO- 
only UltraSPARC Architecture implementation. 


TABLE 7-9 MEMBAR Semantics for TSO-only implementation 











MEMBAR variant Preferred Implementation 
StoreStore NOP 
LoadStore NOP 
StoreLoad #Sync 
LoadLoad NOP 
Sync #Sync 
MemIssue #Sync 
Lookaside #Sync 








If an UltraSPARC Architecture implementation provides a less 
restrictive memory model than TSO (for example, RMO), the 
implementation of the MEMBAR variants may be different. See 
implementation-specific documentation for details. 
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7.62.2 


7.62.3 


Exceptions 


MEMBAR 


Synchronization of the Virtual Processor 


Synchronization of a virtual processor forces all outstanding instructions to be 
completed and any associated hardware errors to be detected and reported before 
any instruction after the synchronizing instruction is issued. 


Synchronization can be explicitly caused by executing a synchronizing MEMBAR 
instruction (MEMBAR #Sync) or by executing an LDXA/STXA/LDDFA/STDFA 
instruction with an ASI that forces synchronization. 


During synchronization, if a disrupting trap condition due to a hardware error is 
detected and external interrupts are enabled, the disrupting trap will occur before 
the instruction after the synchronizing instruction is executed. In this case, the PC 
value saved in TPC during trap entry will be the address of the instruction after the 
synchronizing instruction. 


Programming | Completion of a MEMBAR #Sync instruction does not 

Note | guarantee that data previously stored has been written all the 
way out to external memory (that is, that cache writebacks to 
external memory have completed). Software cannot rely on 
that behavior. There is no mechanism in the UltraSPARC 
Architecture that allows software to wait for all previous stores 
to be written to external memory (that is, for cache writebacks to 
completely drain). 





TSO Ordering Rules affecting Use of MEMBAR 


For detailed rules on use of MEMBAR to enable software to adhere to the ordering 
rules on a virtual processor running with the TSO memory model, refer to TSO 
Ordering Rules on page 413. 


illegal instruction 
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7.63 


For Integer Condition Codes 


MOVcc 


Move Integer Register on Condition 
(MOVcc) 

















Instruction op3 cond Operation icc / xcc Test Assembly Language Syntax Class 
MOVA 101100 1000 Move Always 1 mova i or x cc, reg or immll, regyg A1 
MOVN 101100 0000 Move Never 0 movn i or x cc, reg or immll, reg;g A1 
MOVNE 101100 1001 Move if Not Equal notZ movne'! i or x cc, reg or imml11, reg,g A1 
MOVE 101100 0001 Move if Equal Z movet ior x cc, reg or imm11, regra A1 
MOVG 101100 1010 Move if Greater not (Z or movg i or x cc, reg or immll, regrg A1 
N xor V)) 
MOVLE 101100 0010 Move if Less or Zor(NxorV) movle ior x cc, reg or immll, reg; A1 
Equal 
MOVGE 101100 1011 Move if Greater not (N xor V) movge i or x cc, reg or immll, regrg A1 
or Equal 
MOVL 101100 0011 Move if Less N xor V movl i or x cc, reg or imml1l, regra A1 
MOVGU 101100 1100 Move if Greater, not (C or Z) movgu i or x cc, reg or immll, reg;g A1 
Unsigned 
MOVLEU 101100 0100 Move if Less or (C or Z) movleu i or x cc, reg or immll, reg;g A1 
Equal, Unsigned 
MOVCC 101100 1101 Move if Carry not C movec® i or x cc, reg or imm11, regjg A1 
Clear (Greater or 
Equal, Unsigned) 
MOVCS 101100 0101 Move if Carry Set C moves” i or x cc, reg or_imm11, reg, A1 
(Less than, 
Unsigned) 
MOVPOS 101100 1110 Move if Positive not N movpos i or x cc, reg or immll, reg;g A1 
MOVNEG 101100 0110 Move if Negative N movneg i or x cc, reg or immll, reg;g A1 
MOVVC 101100 1111 Move if Overflow not V movvc i or x cc, reg or immll, reg;g A1 
Clear 
MOVVS 101100 0111 Move if Overflow V movvs i or x cc, reg or immll, regrg A1 
Set 
+ 


synonym: movnz 


t synonym: movz 


? synonym: movgeu 


y synonym: movlu 


Programming | In assembly language, to select the appropriate condition code, 
Note | include $icc or $xcc before the reg_or_imm11 field. 
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For Floating-Point Condition Codes 











Instruction op3 cond Operation fcc Test Assembly Language Syntax Class 

MOVFA 101100 1000 Move Always 1 mova $fccn, reg or immll, regra A1 

MOVEN 101100 0000 Move Never 0 movn $fccn, reg or immll, regra A1 

MOVFU 101100 0111 Moveif Unordered U movu $fccn, reg or immll, regra A1 

MOVFG 101100 0110 Move if Greater G movg $fccn, reg or immll, regra A1 

MOVFUG 101100 0101 Move if Unordered Gor U movug %fccn, reg or immll, regra A1 
or Greater 

MOVEL 101100 0100 Move if Less L movl $fccn, reg_or_imm11, regra A1 

MOVFUL 101100 0011 Move if Unordered L or U movul %fccn, reg or immll, regra A1 
or Less 

MOVFLG 101100 0010 Move if Less or LorG movlg %fccn, reg or immll, regra A1 
Greater 

MOVFNE 101100 0001 Move if Not Equal LorGorU movnet $fccn, reg or immll, regra A1 

MOVFE 101100 1001 Move if Equal E movet $fccn, reg or immll, regra A1 

MOVFUE 101100 1010 Move if Unordered E or U movue %fccn, reg or immll, regra A1 
or Equal 

MOVFGE 101100 1011 Move if Greater or EorG movge %fccn, reg or immll, regra A1 
Equal 

MOVFUGE 101100 1100 Move if Unordered EorG or U movuge %fccn, reg or immll, reg;g A1 
or Greater or Equal 

MOVFLE 101100 1101 Move if Less or EorL movle  $fccn, reg or immll, regra A1 
Equal 

MOVFULE 101100 1110 Move if Unordered EorLorU movule %fccn, reg or immll, reg;g A1 
or Less or Equal 








MOVFO 101100 1111 Move if Ordered EorLorG movo $fccn, reg or immll, regra A1 
t synonym: movnz t synonym: movz 
Programming | In assembly language, to select the appropriate condition code, 


Note | include $£cc0, $£cc1, %fcc2, or $£cc3 before the reg or imm11 
field. 


AT 05 —FEowISERBD — M 
PLU a e e m —] 


31 30 29 25 24 19 18 17 14 13 12 11 10 
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MOVcc 








cc2 cci ccd Condition Code 

0 0 0 fccO 

0 0 1 fect 

0 1 0 fcc2 

0 1 1 fec3 

jJ 0 0 icc 

1 0 1 Reserved (illegal instruction) 
1 1 0 xcc 

1 1 1 Reserved (illegal instruction) 








Description These instructions test to see if cond is TRUE for the selected condition codes. If so, 
they copy the value in R[rs2] if i field = 0, or "sign ext(simm11)" if i = 1 into R[rd]. 
The condition code used is specified by the cc2, cc1, and cc0 fields of the 
instruction. If the condition is FALSE, then R[rd] is not changed. 





These instructions copy an integer register to another integer register if the condition 
is TRUE. The condition code that is used to determine whether the move will occur 
can be either integer condition code (icc or xcc) or any floating-point condition code 
(fccO, fcc1, fcc2, or fcc3). 





These instructions do not modify any condition codes. 


Programming | Branches cause the performance of many implementations to 

Note | degrade significantly. Frequently, the MOVcc and FMOVcc 
instructions can be used to avoid branches. For example, the C 
language if-then-else statement 


if (A > B) then X = 1; lse X = 0; 





can be coded as 


cmp $i0,%i2 

bg,a $xcc,label 

or $g0,1,9i3! X = 1 
or $g0,0,9i3! X = 0 


label:... 


The above sequence requires four instructions, including a branch. 
With MOVcc this could be coded as: 


cmp $i0,%i2 
or $g0,1,9i3! assume X = 1 
movle %xcc,0,%i3! overwrite with X = 0 


This approach takes only three instructions and no branches and 
may boost performance significantly. Use MOVcc and FMOVcc 
instead of branches wherever these instructions would increase 
performance. 





An attempt to execute a MOVcc instruction when either instruction bits 10:5 are 
nonzero or (CC2 :: cC1 :: cc0) = 101, or 1115 causes an illegal instruction exception. 
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If cc2 = 0 (that is, a floating-point condition code is being referenced in the MOVcc 
instructions) and either the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if 


no FPU is present, an attempt to execute a MOVcc instruction causes an fp. disabled 
exception. 


Exceptions illegal instruction 
fo disabled 
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MOVr 





7.64 Move Integer Register on Register 
Condition (MOVr) 





Instruction op3 rcond Operation Test Assembly Language Syntax Class 
— 101111 000 Reserved (illegal instruction) — 


MOVRZ 101111 001 Move if Register Zero Rirst}=0  movrz!  regyg,, reg or imm10, regjg A1 


MOVRLEZ 101111 010 Move if Register Less R[rst] S0 movrlez reg,ss, reg or imm10, regrg A1 
Than or Equal to Zero 


MOVRLZ 101111 011 Move if Register Less R[si]«0  movrlz reg,s, reg or imm10, regag — A1 
Than Zero 


— 101111 100 Reserved (illegal instruction) — 


MOVRNZ 101111 101 Move if Register Not Rirst]#0 movrnzt Tegrg,1, leg or imm10, regra A1 
Zero 

MOVRGZ 101111 110 Move if Register R[si]20  movrgz  reg;g,, reg or immlO, regrg A1 
Greater Than Zero 

MOVRGEZ 101111 111 Move if Register R[ri]20  movrgez reg,ss, reg or imm10, reg A1 
Greater Than or Equal 

to Zero 











t synonym: movre t synonym: movrne 


10 rd op3 rs1 rcond rs2 
pup EE EE e 


31 30 29 25 24 19 18 14 13 12 10 9 5 4 0 


Description If the contents of integer register R[rs1] satisfy the condition specified in the rcond 
field, these instructions copy their second operand (if i = 0, R[rs2]; ifi- 1, 
sign ext(simm10)) into R[rd]. If the contents of R[rs1] do not satisfy the condition, 
then R[rd] is not modified. 


These instructions treat the register contents as a signed integer value; they do not 
modify any condition codes. 


Programming | The MOVr instructions are "64-bit-only" instructions; there is no 
Note | version of these instructions that operates on just the less- 
significant 32 bits of their source operands. 
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Implementation | If this instruction is implemented by tagging each register value 
Note | with an n (negative) and a z (zero) bit, use the table below to 
determine if rcond is TRUE. 





Move Test 
MOVRNZ not Z 
MOVRZ Z 
MOVRGEZ not N 
MOVRLZ N 
MOVRLEZ NorZ 
MOVRGZ N nor Z 


An attempt to execute a MOVr instruction when either instruction bits 9:5 are 
nonzero or rcond = 0005 or 100; causes an illegal instruction exception. 


Exceptions illegal instruction 
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MULScc - Deprecated 





7.65 


Multiply Step 


The MULScc instruction is deprecated and should not be used in new software. 
The MULX instruction should be used instead. 





Opcode op3 


Operation Assembly Language Syntax Class 





MULScc- 100100 Multiply Step and modify cc's mulscc egy, leg or imm, rTegyg Y3 





ros poe go orum 
EI LG T US EE m9 


31 30 29 


Description 


25 24 19 18 14 13 12 


MULScc treats the less-significant 32 bits of R[rs1] and the less-significant 32 bits of 
the Y register as a single 64-bit, right-shiftable doubleword register. The least 
significant bit of R[rs1] is treated as if it were adjacent to bit 31 of the Y register. The 
MULScc instruction performs an addition operation, based on the least significant 
bit of Y. 


Multiplication assumes that the Y register initially contains the multiplier, R[rs1] 
contains the most significant bits of the product, and R[rs2] contains the 
multiplicand. Upon completion of the multiplication, the Y register contains the least 
significant bits of the product. 


Note | In a standard MULScc instruction, rs1 = rd. 


MULScc operates as follows: 
1. If i = 0, the multiplicand is R[rs2]; if i = 1, the multiplicand is sign ext (simm13). 


2. A 32-bit value is computed by shifting the value from R[rs1] right by one bit with 
“CCR.icc.n xor CCR.icc.v” replacing bit 31 of R[rs1]. (This is the proper sign for 
the previous partial product.) 


3. If the least significant bit of Y = 1, the shifted value from step (2) and the 
multiplicand are added. If the least significant bit of the Y = 0, then 0 is added to 
the shifted value from step (2). 
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4. MULScc writes the following result values: 








Register field Value written by MULScc 

CCR.icc updated according to the result of the addition in step (3) 
above 

R[rd]{63:32} undefined 

R[rd]{31:0} the least-significant 32 bits of the sum from step (3) above 

Y the previous value of the Y register, shifted right by one 


bit, with Y{31} replaced by the value of R[rs1]{0} prior to 
shifting in step (2) 


CCR.xcc undefined 





5. The Y register is shifted right by one bit, with the least significant bit of the 
unshifted R[rs1] replacing bit 31 of Y. 


An attempt to execute a MULScc instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 


See Also RDY on page 303 
SDIV, SDIVcc on page 321 
SMUL, SMULcc on page 329 
UDIV, UDIVcc on page 372 
UMUL, UMULcc on page 374 
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MULX / SDIVX / UDIVX 





7.66 


Multiply and Divide (64-bit) 





Instruction op3 Operation Assembly Language Class 
MULX 00 1001 Multiply (signed or unsigned) mulx Tégrg1, leg Or imm, Tegra A1 
SDIVX 10 1101 Signed Divide sdivx Tégrg1, leg Or imm, Tegra A1 
UDIVX 00 1101 Unsigned Divide udivx Tégrg1, leg Or imm, Tr A1 





a A 


Description 


Exceptions 


25 24 19 18 14 13 12 5 4 0 


MULX computes “R[rs1] x R[rs2]" if i= 0 or "R[rs1] x sign ext (simm13)" ifi=1, 
and writes the 64-bit product into R[rd]. MULX can be used to calculate the 64-bit 
product for signed or unsigned operands (the product is the same). 


SDIVX and UDIVX compute “R[rs1] + R[rs2]" if i = 0 or 

^R[rs1] + sign ext (simm13)" if i = 1, and write the 64-bit result into R[rd]. SDIVX 
operates on the operands as signed integers and produces a corresponding signed 
result. UDIVX operates on the operands as unsigned integers and produces a 
corresponding unsigned result. 


For SDIVX, if the largest negative number is divided by -1, the result should be the 
largest negative number. That is: 


8000 0000 0000 000046 + FFFF FFFF FFFF FFFF46 = 8000 0000 0000 00004. 
These instructions do not modify any condition codes. 
An attempt to execute a MULX, SDIVX, or UDIVX instruction when i = 0 and 


instruction bits 12:5 are nonzero causes an illegal instruction exception. 


illegal instruction 
division by zero 


CHAPTER 7 * Instructions 287 


NOP 





7.67 — No Operation 


Instruction op2 Operation Assembly Language Syntax Class 


NOP 100 No Operation nop Al 





|00|rd-00000 | op? | imm22=0000000000000000000000 


31 30 29 25 24 22 21 0 


Description The NOP instruction changes no program-visible state (except that of the PC 
register). 
NOP is a special case of the SETHI instruction, with imm22 = 0 and rd = 0. 


Programming | There are many other opcodes that may execute as NOPs; 
Note | however, this dedicated NOP instruction is the only one 
guaranteed to be implemented efficiently across all 
implementations. 


Exceptions None 
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NORMALW 





7.68 NORMALW 


Instruction Operation Assembly Language Syntax Class 


NORMALWP “Other” register windows become "normal" register windows normalw A1 
g g 


EN oo] 


31 30 29 25 24 19 18 0 





Description ^ NORMALWP is a privileged instruction that copies the value of the OTHERWIN 
register to the CANRESTORE register, then sets the OTHERWIN register to zero. 


Programming | The NORMALW instruction is used when changing address 
Notes | spaces. NORMALW indicates the current "other" windows are 
now "normal" windows and should use the spill n normal and 
fill n normal traps when they generate a trap due to window spill 
or fill exceptions. The window state may become inconsistent if 
NORMALW is used when CANRESTORE is nonzero. 


This instruction allows window manipulations to be atomic, 
without the value of N REG WINDOWS being visible to privileged 
software and without an assumption that N REG WINDOWS is 
constant (since hyperprivileged software can migrate a thread 
among virtual processors, across which N REG WINDOWS may 
vary). 





In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and is emulated in 


software. 
Exceptions illegal instruction (not implemented in hardware in UltraSPARC Architecture 2005) 
See Also ALLCLEAN on page 150 


INVALW on page 240 
OTHERW on page 291 
RESTORED on page 311 
SAVED on page 319 
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7.69 | OR Logical Operation 





Instruction — op3 Operation Assembly Language Syntax Class 
OR 00 0010 Inclusive or or Tegrg1, T€g Or imm, Ter A1 
ORcc 01 0010 Inclusive or and modify cc's orcc Tégyg1, T€g OT imm, Tera A1 
ORN 00 0110 Inclusive or not orn Tégrg1, T€g Or imm, Tera A1 
ORNcc 01 0110 Inclusive or not and modify cc's  orncc egy, reg or imm, reg A1 





ET 


3 US sin 
31 30 29 25 24 19 18 14 13 12 5 4 0 


Description These instructions implement bitwise logical or operations. They compute “R[rs1] 
op R[rs2]" if i = 0, or "R[rs1] op sign ext (simm13)" if i = 1, and write the result into 
R[rd]. 


ORcc and ORNcc modify the integer condition codes (icc and xcc). They set the 
condition codes as follows: 


iCC.v, iCC.C, XCC.V, and xcc.c are set to 0 

icc.n is copied from bit 31 of the result 

xcc.n is copied from bit 63 of the result 

icc.z is set to 1 if bits 31:0 of the result are zero (otherwise to 0) 
XCC.Z is set to 1 if all 64 bits of the result are zero (otherwise to 0) 


ORN and ORNcc logically negate their second operand before applying the main 
(or) operation. 


An attempt to execute an OR[N][cc] instruction when i = 0 and instruction bits 12:5 
are nonzero causes an /llegal instruction exception. 


Exceptions illegal instruction 
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OTHERW 





7.70 OTHERW 


Instruction Operation Assembly Language Syntax Class 


OTHERWP “Normal” register windows become "other" otherw A1 
register windows 





EL EL ES 


31 30 29 25 24 19 18 0 


Description OTHERW! is a privileged instruction that copies the value of the CANRESTORE 
register to the OTHERWIN register, then sets the CANRESTORE register to zero. 


Programming | The OTHERW instruction is used when changing address spaces. 

Notes | OTHERW indicates the current "normal" register windows are 
now "other" register windows and should use the spill! n other 
and fill n other traps when they generate a trap due to window 
spill or fill exceptions. The window state may become inconsistent 
if OTHERW is used when OTHERWIN is nonzero. 


This instruction allows window manipulations to be atomic, 
without the value of N REG WINDOWS being visible to privileged 
software and without an assumption that N REG WINDOWS is 
constant (since hyperprivileged software can migrate a thread 
among virtual processors, across which N REG WINDOWS may 
vary). 





In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and is emulated in 


software. 
Exceptions illegal instruction (not implemented in hardware in UltraSPARC Architecture 2005) 
See Also ALLCLEAN on page 150 


INVALW on page 240 
NORMALW on page 289 
RESTORED on page 311 
SAVED on page 319 
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PDIST 





7.71 Pixel Component Distance 
(with Accumulation) [visi] 








Instruction  opf Operation Assembly Language Syntax Class 


PDIST 0 0011 1110 Distance between eight 8-bit components, pdist  freg;, fregrsor freg,g C3 
with accumulation 


110110 rst opf rs2 
31 30 29 25 24 19 18 14 13 5 4 0 


Description Eight unsigned 8-bit values are contained in the 64-bit floating-point source registers 
Fp[rs1] and Fp[rs2]. The corresponding 8-bit values in the source registers are 
subtracted (that is, each byte in Fp[rs2] is subtracted from the corresponding byte in 
Fp[rs1]). The sum of the absolute value of each difference is added to the integer in 
Fp[rd] and the resulting integer sum is stored in the destination register, Fp[rd]. 


Programming | PDIST uses Fp[rd] as both a source and a destination register. 


Notes Typically, PDIST is used for motion estimation in video 


compression algorithms. 


In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and is emulated in 
software. 


Exceptions illegal instruction 
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POPC 





7.72 | Population Count 





Instruction op3 Operation Assembly Language Syntax Class 
POPC 10 1110 Population Count popc eg or imm, regrg C3 
E ops 0000 a —[ 5 


ES 


31 30 29 25 24 19 18 14 13 12 5 4 0 


POPC counts the number of one bits in R[rs2] if i = 0, or the number of one bits in 
sign_ext (simm13) if i = 1, and stores the count in R[rd]. This instruction does not 
modify the condition codes. 


V9 Compatibility | Instruction bits 18 through 14 must be zero for POPC. Other 
Note | encodings of this field (rs1) may be used in future versions of the 
SPARC architecture for other instructions. 


Description 


Programming | POPC can be used to "find first bit set" in a register. A ‘C’- 
Note | language program illustrating how POPC can be used for this 
purpose follows: 
int ffs(zz)/* finds first 1 bit, counting from the LSB */ 
unsigned zz; 


( 


^ 


return popc ( zz (= (-22)));/* for nonzero 22 */ 


} 
Inline assembly language code for ffs () is: 


-zz(2's complement) 





neg SIN, $M IN 


Example computation: 


(exclusive nor) 


[ 
xnor SIN, £M IN, sTEMP ! ^ ~ -zz 
popc sTEMP, RESULT ! result = popc(zz ^ ~ -zz) 
movrz  %IN,%g90,%RESULT ! SRESULT should be 0 for %IN=0 


where IN, M. IN, TEMP, and RESULT are integer registers. 


IN = ...00101000 list ‘1’ bit from right is 
-IN = ...11011000 ! bit 3 (4th bit) 
^ -IN = ...00100111 
IN ^ ~ -IÑ = ...00001111 
popc (IN ^ ~ -IN = 4 
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Programming | POPC can be used to "centrifuge" all the ‘1’ bits in a register to the 
Note | least significant end of a destination register. Assembly-language 
code illustrating how POPC can be used for this purpose follows: 


popc SIN, %DEST 


cmp SIN, -1 ! Test for pattern of all 1's 
mov -1, sTEMP ! Constant -1 -> temp register 
sllx sTEMP,%DEST,%DEST ! (shift count of 64 same as 0) 
not SDEST ! 

movcc $£xcc, -1, %DEST ! If src was -1, result is -1 


where IN, TEMP, and DEST are integer registers. 


Programming | POPC is a "64-bit-only" instruction; there is no version of this 
Note | instruction that operates on just the less-significant 32 bits of its 
source operand. 





In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and is emulated in 
software. 


An attempt to execute a POPC instruction when either instruction bits 18:14 are 
nonzero, or i = 0 and instruction bits 12:5 are nonzero causes an /llegal instruction 
exception. 


Exceptions illegal instruction 
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7.73 


Prefetch 


PREFETCH 





Instruction op3 Operation Assembly Language Syntax Class 
PREFETCH 101101 Prefetch Data prefetch [address], prefetch fcn A1 
PREFETCHAPAS 111101 Prefetch Data from prefetcha [regaddr] imm asi, prefetch fcn A1 
Alternate Space prefetcha [reg plus imm] $asi,prefetch fcn 
PREFETCH 
m sd EL ———I-5 
fcn op3 rst i=1 simm13 
31 30 29 25 24 19 18 14 13 12 5 0 
PREFETCHA 
im spi S TED Umm I 
en WP I8 Lm — 
31 30 29 25 24 19 18 14 13 12 5 0 


TABLE 7-10 Prefetch Variants, by Function Code 





fcn Prefetch Variant 

0 (Weak) Prefetch for several reads 

1 (Weak) Prefetch for one read 

2 (Weak) Prefetch for several writes and possibly reads 
3 (Weak) Prefetch for one write 

4 Prefetch page 

5-15 (0516-0F16) Reserved (illegal instruction) 

16 (10419 Implementation dependent (NOP if not implemented) 
17 (1149 Prefetch to nearest unified cache 

18-19 (1216-1316) | Implementation dependent (NOP if not implemented) 
20 (1446) Strong Prefetch for several reads 

21 (1546) Strong Prefetch for one read 

22 (1646) Strong Prefetch for several writes and possibly reads 
23 (1716) Strong Prefetch for one write 


24-31 (1816-1F16) 


Implementation dependent (NOP if not implemented) 
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Description A PREFETCH[A] instruction provides a hint to the virtual processor that software 
expects to access a particular address in memory in the near future, so that the 
virtual processor may take action to reduce the latency of accesses near that address. 
Typically, execution of a prefetch instruction initiates movement of a block of data 
containing the addressed byte from memory toward the virtual processor or creates 
an address mapping. 


Implementation | A PREFETCH[A] instruction may be used by software to: 


Note |, prefetch a cache line into a cache 


* prefetch a valid address translation into a TLB 
* invalidate a cache line that may have caused a correctable error during 
a load instruction. 


If i = 0, the effective address operand for the PREFETCH instruction is 
“R[rs1] + R[rs2]"; if i= 1, it is "R[rs1] + sign ext (simm13)". 


PREFETCH instructions access the primary address space 
(ASI PRIMARY[ LITTLE]). 





PREFETCHA instructions access an alternate address space. If i = 0, the address 
space identifier (ASI) to be used for the instruction is in the imm asi field. If i = 1, the 
ASI is found in the ASI register. 


A prefetch operates much the same as a regular load operation, but with certain 
important differences. In particular, a PREFETCH[A] instruction is non-blocking; 
subsequent instructions can continue to execute while the prefetch is in progress. 


When executed in nonprivileged or privileged mode, PREFETCH[A] has the same 
observable effect as a NOP. A prefetch instruction will not cause a trap if applied to 
an illegal or nonexistent memory address. (impl. dep. #103-V9-Ms10(e)) 


IMPL. DEP. #103-V9-Ms10(a): The size and alignment in memory of the data block 
prefetched is implementation dependent; the minimum size is 64 bytes and the 
minimum alignment is a 64-byte boundary. 
Programming | Software may prefetch 64 bytes beginning at an arbitrary address 
Note | address by issuing the instructions 


prefetch [address], prefetch fcn 
prefetch [address + 63], prefetch fcn 


Variants of the prefetch instruction can be used to prepare the memory system for 
different types of accesses. 


IMPL. DEP. #103-V9-Ms10(b): An implementation may implement none, some, or 
all of the defined PREFETCH[A] variants. It is implementation-dependent whether 
each variant is (1) not implemented and executes as a NOP, (2) is implemented and 
supports the full semantics for that variant, or (3) is implemented and only supports 
the simple common-case prefetching semantics for that variant. 
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PREFETCH 


Prefetch instructions PREFETCH and PREFETCHA generate exceptions under the 
conditions detailed in TABLE 7-11. Only the implementation-dependent prefetch 
variants (see TABLE 7-10) may generate an exception under conditions not listed in 
this table; the predefined variants only generate the exceptions listed here. 


TABLE 7-11 Behavior of PREFETCHI[A] Instructions Under Exceptional Conditions 








fcn Instruction Condition Result 
any PREFETCH i= 0 and instruction bits 12:5 are illegal instruction 
nonzero 
any PREFETCHA reference to an ASI in the range executes as NOP 
016-7F16, while in nonprivileged 
mode (privileged action condition) 
any PREFETCHA reference to an ASI in range executes as NOP 
3016--7F16, while in privileged 
mode (privileged action condition) 
0-3 PREFETCH[A] condition detected for MMU miss  executes as NOP 
(weak) (data_access_MMU_miss or 
fast data access MMU miss ) 
0-3 PREFETCH[A] condition detected for executes as NOP 
(weak) data access MMU error 
0-4 PREFETCH[A] variant unimplemented executes as NOP 
0-4 PREFETCHA reference to an invalid ASI executes as NOP 


0-4,17, | PREFETCH[A] 
20-23 


4, 20-23 PREFETCH[A] 
(strong) 


4, 20-23 PREFETCH[A] 
(strong) 


4, 20-23 | PREFETCH[A] 
(strong) 


5-15 PREFETCHIA] 
(0516—-0F16) 


16-31 PREFETCH[A] 
(1816-1F16) 


(ASI not listed in following table) 


condition detected for ((TTE.cp = 0) executes as NOP 


or ((fcn = 0) and TTE.cv = 0)), or 
(TTE.e = 1) 


prefetching the requested data 
would be a very time-consuming 
operation (condition detected for 
data_access_MMU_miss ) 


prefetching the requested data 
would be a time-consuming 
operation (condition detected for 
fast data access MMU miss ) 





condition detected for 
data access MMU error 


(always) 


variant unimplemented 








executes as NOP 


executes as NOP 


data access MMU error 


illegal instruction 


executes as NOP 
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ASls valid for PREFETCHA (all others are invalid) 








ASI AS IF PRIV PRIMARY ASI AS IF PRIV PRIMARY LITTLE 
ASI AS IF PRIV SECONDARY ASI AS IF PRIV SECONDARY LITTLE 
ASI NUCLEUS ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 

ASI PRIMARY NO FAULT ASI PRIMARY NO FAULT LITTLE 

ASI SECONDARY NO  FAULT ASI SECONDARY NO FAULT LITTLE 
ASI REAL ASI REAL LITTLE 








7./3.2 | Weak versus Strong Prefetches 


Some prefetch variants are available in two versions, "Weak" and "Strong". 


From software's perspective, the difference between the two is the degree of 
certainty that the data being prefetched will subsequently be accessed. That, in 
turn, affects the amount of effort (time) it's willing for the underlying hardware to 
invest to perform the prefetch. If the prefetch is speculative (software believes the 
data will probably be needed, but isn't sure), a Weak prefetch will initiate data 
movement if the operation can be performed quickly, but abort the prefetch and 
behave like a NOP if it turns out that performing the full prefetch will be time- 
consuming. If software has very high confidence that data being prefetched will 
subsequently be accessed, then a Strong prefetch requests that the prefetch operation 
will continue, even if the prefetch operation does become time-consuming. 


From the virtual processor's perspective, the difference between a Weak and a 
Strong prefetch is whether the prefetch is allowed to perform a time-consuming 
operation! in order to complete. If a time-consuming operation is required, a Weak 
prefetch will abandon the operation and behave like a NOP while a Strong prefetch 
may pay the cost of performing the time-consuming operation so it can finish 
initiating the requested data movement. Behavioral differences among loads and 
prefetches are compared in TABLE 7-12. 


TABLE 7-12 Comparative Behavior of Load and Weak Prefetch Operations 





Behavior 
Condition Load Prefetch 
On a TLB miss, is an MMU access performed? Yes Yes 


Upon detection of fast data access MMU miss exception... Traps NOP} 





Upon detection of privileged action, data access exception, Traps NOP} 
data access protection, PA watchpoint, or VA watchpoint 
exception... 
1. such as a fast data access MMU_ miss trap, plus subsequently filling the cache line at the requested address 
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TABLE 7-12 Comparative Behavior of Load and Weak Prefetch Operations 


Behavior 
Condition Load Prefetch 


If page table entry has cp = 0, e = 1, and cv = 0 for Prefetch for Traps NOP} 
Several Reads 


If page table entry has nfo = 1 for a non-NoFault access... Traps NOP} 


If page table entry has w = 0 for any prefetch for write access Traps NOP 
(fcn = 2, 3, 22, or 23)... 


Upon detection of fatal error or disrupting error conditions... Traps Traps 








Instruction blocks until cache line filled? Yes No 


Prefetch Variants 


The prefetch variant is selected by the fcn field of the instruction. fcn values 5-15 are 
reserved for future extensions of the architecture, and PREFETCH fcn values of 16- 
19 and 24-31 are implementation dependent in UltraSPARC Architecture 2005. 


Each prefetch variant reflects an intent on the part of the compiler or programmer, a 
"hint" to the underlying virtual processor. This is different from other instructions 
(except BPN), all of which cause specific actions to occur. An UltraSPARC 
Architecture implementation may implement a prefetch variant by any technique, as 
long as the intent of the variant is achieved (impl. dep. #103-V9-Ms10(b)). 


The prefetch instruction is designed to treat common cases well. The variants are 
intended to provide scalability for future improvements in both hardware and 
compilers. If a variant is implemented, it should have the effects described below. In 
case some of the variants listed below are implemented and some are not, a 
recommended overloading of the unimplemented variants is provided in the SPARC 
V9 specification. An implementation must treat any unimplemented prefetch fcn 
values as NOPs (impl. dep. #103-V9-Ms10). 


7.73.3.1 Prefetch for Several Reads (fcn = 0, 20(1449)) 


The intent of these variants is to cause movement of data into the cache nearest the 
virtual processor. 


There are Weak and Strong versions of this prefetch variant; fcn = 0 is Weak and 
fcn = 20 is Strong. The choice of Weak or Strong variant controls the degree of effort 
that the virtual processor may expend to obtain the data. 


Programming | The intended use of this variant is for streaming relatively small 
Note amounts of data into the primary data cache of the virtual 
processor. 
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7.73.3.2 Prefetch for One Read (fcn = 1, 21(15,6)) 


The data to be read from the given address are expected to be read once and not 
reused (read or written) soon after that. Use of this PREFETCH variant indicates 
that, if possible, the data cache should be minimally disturbed by the data read from 
the given address. 


There are Weak and Strong versions of this prefetch variant; fcn = 1 is Weak and 
fcn = 21 is Strong. The choice of Weak or Strong variant controls the degree of effort 
that the virtual processor may expend to obtain the data. 


Programming | The intended use of this variant is in streaming medium amounts 
Note | of data into the virtual processor without disturbing the data in 
the primary data cache memory. 


7./3.3.3 Prefetch for Several Writes (and Possibly Reads) 
(fcn = 2, 22(16,6)) 


The intent of this variant is to cause movement of data in preparation for multiple 
writes. 


There are Weak and Strong versions of this prefetch variant; fcn = 2 is Weak and 
fcn = 22 is Strong. The choice of Weak or Strong variant controls the degree of effort 
that the virtual processor may expend to obtain the data. 


Programming | An example use of this variant is to initialize a cache line, in 
Note | preparation for a partial write. 


Implementation | On a multiprocessor system, this variant indicates that exclusive 

Note ownership of the addressed data is needed. Therefore, it may 
have the additional effect of obtaining exclusive ownership of the 
addressed cache line. 





7.73.3.4 Prefetch for One Write (fcn = 3, 23(17,6)) 


The intent of this variant is to initiate movement of data in preparation for a single 
write. This variant indicates that, if possible, the data cache should be minimally 
disturbed by the data written to this address, because those data are not expected to 
be reused (read or written) soon after they have been written once. 


There are Weak and Strong versions of this prefetch variant; fcn = 3 is Weak and 
fcn = 23 is Strong. The choice of Weak or Strong variant controls the degree of effort 
that the virtual processor may expend to obtain the data. 
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7.73.3.5 Prefetch Page (fcn = 4) 


In a virtual memory system, the intended action of this variant is for hardware (or 
privileged or hyperprivileged software) to initiate asynchronous mapping of the 
referenced virtual address (assuming that it is legal to do so). 


Programming 
Note 


Prefetch Page is used is to avoid a later page fault for the given 
address, or at least to shorten the latency of a page fault. 





In a non-virtual-memory system or if the addressed page is already mapped, this 
variant has no effect. 


Implementation | The mapping required by Prefetch Page may be performed by 
Note | privileged software, hyperprivileged software, or hardware. 


Implementation-Dependent Prefetch Variants 
(fen = 16, 18, 19, and 24-31) 


IMPL. DEP. #103-V9-Ms10(c): Whether and how PREFETCH fens 16, 18, 19 and 24- 
31 are implemented are implementation dependent. If a variant is not implemented, 
it must execute as a NOP. 


Additional Notes 


Programming | Prefetch instructions do have some “cost to execute”. As long as 

Note | the cost of executing a prefetch instruction is well less than the 
cost of a cache miss, use of prefetching provides a net gain in 
performance. 


It does not appear that prefetching causes a significant number of 
useless fetches from memory, though it may increase the rate of 
useful fetches (and hence the bandwidth), because it more 
efficiently overlaps computing with fetching. 


Programming | A compiler that generates PREFETCH instructions should 

Note | generate each of the variants where its use is most appropriate. 
That will help portable software be reasonably efficient across a 
range of hardware configurations. 





Implementation | Any effects of a data prefetch operation in privileged or 
Note | hyperprivileged code should be reasonable (for example, in 
handling ECC errors, no page prefetching is allowed within code 
that handles page faults). The benefits of prefetching should be 
available to most privileged code. 
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Implementation | A prefetch from a nonprefetchable location has no effect. It is up 
Note | to memory management hardware to determine how locations 
are identified as not prefetchable. 


Exceptions illegal instruction 
data access MMU error 
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7.74 Read Ancillary State Register 


Instruction rs1 
RDYP 0 
— 1 
RDCCR 2 
RDASI 3 
RDTICKP»e 4 
RDPC 5 
RDFPRS 6 
— 7-14 
See text 15 
RDPCRP 16 
RDPICPric 17 
= 18 
RDGSR 19 
— 20-21 
RDSOFTINTP 22 


RDTICK_CMPR? 23 
RDSTICKPrrt 24 
RDSTICK_CMPRP 25 


— 26-27 
— 28 


— 29-31 


Operation 

Read Y register (deprecated) 

Reserved 

Read Condition Codes register (CCR) 
Read ASI register 

Read TICK register 

Read Program Counter (PC) 


Read Floating-Point Registers Status (FPRS) 
register 


Reserved 
MEMBAR or Reserved; see text 
Read Performance Control registers (PCR) 


Read Performance Instrumentation Counters 
register (PIC) 


Reserved (impl. dep. #8-V8-Cs20, 9-V8-Cs20) 
Read General Status register (GSR) 
Reserved (impl. dep. #8-V8-Cs20, #9-V8-Cs20) 


Assembly Language Syntax 


rd Sy, regu 


rd $ccr, rer 
rd $asi, reg; 
rd Stick, reg 
rd $pc, regrg 

rd $fprs, reg; 


rd $pcr, rer 


rd $pic, Tegra 


rd $gsr, Tegra 


Read per-virtual processor Soft Interrupt register rd $softint, reg 


(SOFTINT) 
Read Tick Compare register (TICK CMPR) 
Read System Tick Register (STICK) 


Read System Tick Compare register 
(STICK CMPR) 


Reserved (impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


Implementation dependent 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


Implementation dependent 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


rd $tick cmpr, regrg 
rd $stickt, reg, 


rd %stick_cmprt, regra 


Class 
D2 


A1 
A1 
A1 
A2 
A1 


A1 
A1 


A1 


N2 


N2 
N2 
N2 





+ The original assembly language names for $stick and $stick cmpr were, respectively, $sys tick and $sys tick cmpr, which are 
now deprecated. Over time, assemblers will support the new $stick and $stick cmpr names for these registers (which are consistent 
with $tick and $tick cmpr). In the meantime, some existing assemblers may only recognize the original names. 





WI mS 
0 29 4 9 T8 7 0 
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Description The Read Ancillary State Register (RDasr) instructions copy the contents of the state 
register specified by rs1 into R[rd]. 


An RDasr instruction with rs1 = 0 is a (deprecated) RDY instruction (which should 
not be used in new software). 


The RDY instruction is deprecated. It is recommended that all instructions that 
reference the Y register be avoided. 


RDPC copies the contents of the PC register into R[rd]. If PSTATE.am = 0, the full 
64-bit address is copied into R[rd]. If PSTATE.am = 1, only a 32-bit address is saved; 
PC{31:0} is copied to R[rd](31:0] and R[rd]{63:32} is set to 0. (closed impl. dep. #125- 
V9-Cs10) 


RDFPRS waits for all pending FPops and loads of floating-point registers to 
complete before reading the FPRS register. 


The following values of rs1 are reserved for future versions of the architecture: 1, 7- 
14, 18, 20-21, and 26-27. 


IMPL. DEP. #47-V8-Cs20: RDasr instructions with rd in the range 28-31 are 

available for implementation-dependent uses (impl. dep. #8-V8-Cs20). For an RDasr 

instruction with rs1 in the range 28-31, the following are implementation 

dependent: 

m the interpretation of bits 13:0 and 29:25 in the instruction 

m whether the instruction is nonprivileged or privileged or hyperprivileged (impl. 
dep. #9-V8-Cs20), and 

m whether an attempt to execute the instruction causes an illegal instruction 
exception. 


Implementation | See the section "Read/Write Ancillary State Registers (ASRs)" in 

Note | Extending the UltraSPARC Architecture, contained in the separate 
volume UltraSPARC Architecture Application Notes, for a 
discussion of extending the SPARC V9 instruction set using read / 
write ASR instructions. 


Note | Ancillary state registers may include (for example) timer, counter, 
diagnostic, self-test, and trap-control registers. 


SPARC V8 | The SPARC V8 RDPSR, RDWIM, and RDTBR instructions do not 
Compatibility | exist in the UltraSPARC Architecture, since the PSR, WIM, and 
Note | TBR registers do not exist. 





See Ancillary State Registers on page 70 for more detailed information regarding ASR 
registers. 


304 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


Exceptions 


See Also 


RDasr 


Exceptions. An attempt to execute a RDasr instruction when any of the following 
conditions are true causes an illegal_instruction exception: 


m rsi = 15 and rd + 0 (reserved for future versions of the architecture) 


m rst = 1, 7-14, 18, 20-21, or 26-27 (reserved for future versions of the architecture) 
m instruction bits 13:0 are nonzero 


An attempt to execute a RDPCR (impl. dep. #250-U3-Cs10), RDSOFTINT, 

RDTICK CMPR, RDSTICK, or RDSTICK CMPR instruction in nonprivileged mode 
(PSTATE.priv = 0 and HPSTATE.hpriv = 0) causes a privileged opcode exception 
(impl. dep. #250-U3-Cs10). 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, 
an attempt to execute a RDGSR instruction causes an fp disabled exception. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), the following 
cause a privileged action exception: 
m execution of RDTICK when nonprivileged access to TICK is disabled 
(TICK.npt = 1) 
m execution of RDSTICK when nonprivileged access to STICK is disabled 
(STICK.npt = 1) 
m execution of RDPIC when nonprivileged access to PIC is disabled (PCR.priv = 1) 


Implementation | RDasr shares an opcode withMEMBAR; it is distinguished by 
Note | rs1 = 15 or rd = 0 or (i = 0, and bit 12 = 0). 


illegal instruction 
privileged opcode 
fp disabled 
privileged action 


RDHPR on page 306 
RDPR on page 307 
WRasr on page 376 
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Read Hyperprivileged Register 


Instruction op3 Operation rs1 Assembly Language Syntax Class 
RDHPRE 101001 Read hyperprivileged register N2 

HPSTATE 0 rdhpr Shpstate, TeS rg 

HTSTATE 1 rdhpr Shtstate, TeS rg 

Reserved 2 

HINTP 3 rdhpr Shintp, TeS rg 

Reserved 4 

HTBA 5 rdhpr  $htba, reg 

HVER 6 rdhpr  $hver, regrg 

Reserved 7-30 

HSTICK CMPR 31 rdhpr  $hstick ompr, reg;g 














FT a DE a MESE] 


31 30 29 


Description 


Exceptions 


See Also 


25 24 19 18 14 18 0 


This instruction reads the contents of the specified hyperprivileged state register into 
the destination register, R[rd]. The rs1 field in the RDHPR instruction determines 
which hyperprivileged register is read. 


There are MAXTL copies of the HTSTATE register. A read from HTSTATE returns the 
value in the copy of HTSTATE indexed by the current value in the trap level register 
(TL). 


An attempt to execute a RDHPR instruction when any of the following conditions 

exist causes an /llegal instruction exception: 

m instruction bits 13:0 are nonzero 

m rs1=2,rs1 = 4, or 7 < rs1 € 30 (reserved rs1 values) 

m HPSTATE.hpriv = 0 (the processor is not in hyperprivileged mode) 

m rsi=1 (attempt to read the HTSTATE register) while TL = 0 (current trap level is 
zero) 


illegal instruction 


RDasr on page 303 
RDPR on page 307 
WRHPR on page 380 
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7.76 | Read Privileged Register 





Instruction op3 Operation rsi Assembly Language Syntax 

RDPR? 101010 Read Privileged register 
TPC 0 rdpr SEPC, rer 
TNPC 1 rdpr $tnpc, rer 
TSTATE 2 rdpr Ststate, regrd 
TT 3 rdpr tt, rer 
TICK 4 rdpr Stick, Tegra 
TBA 5 rdpr $tba, reg 
PSTATE 6 rdpr $pstate, regrg 
TL 7 rdpr Stl, rer 
PIL 8 rdpr Spil, reg 
CWP 9 rdpr Scwp, legig 
CANSAVE 10 rdpr $cansave, erg 
CANRESTORE 11 rdpr Scanrestore, reg; 
CLEANWIN 12 rdpr S$cleanwin, regrq 
OTHERWIN 13 rdpr Sotherwin, regrg 
WSTATE 14 rdpr swstate, legyg 
Reserved 15 
GL 16 rdpr $gl, rer 
Reserved 17-31 


o| sd CE 


31 30 29 25 24 19 18 14 13 


Description The rs1 field in the instruction determines the privileged register that is read. There 
are MAXTL copies of the TPC, TNPC, TT, and TSTATE registers. A read from one of 
these registers returns the value in the register indexed by the current value in the 
trap level register (TL). A read of TPC, TNPC, TT, or TSTATE when the trap level is 


zero (TL = 0) causes an illegal instruction exception. 


Class 
N2 


An attempt to execute a RDPR instruction when any of the following conditions 


exist causes an /llegal instruction exception: 
m instruction bits 13:0 are nonzero 
m rS1-15,0r 17 €rs1 < 31 (reserved rs1 values) 


m Oxrs1 <3 (attempt to read TPC, TNPC,TSTATE, or TT register) while TL = 0 


(current trap level is zero) and the virtual processor is in privileged or 


hyperprivileged mode. 


Implementation | In nonprivileged mode, //legal instruction exception due to 
Note | 0 < rs1 < 3 and TL = 0 does not occur; the privileged opcode 


exception occurs instead. 
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An attempt to execute a RDPR instruction in nonprivileged mode (PSTATE.priv = 0 
and HSTATE.hpriv = 0) causes a privileged opcode exception. 


Historical Note 


Exceptions illegal instruction 





On some early SPARC implementations, floating-point exceptions 
could cause deferred traps. To ensure that execution could be 
correctly resumed after handling a deferred trap, hardware 
provided a floating-point queue (FQ), from which the address of 
the trapping instruction could be obtained by the trap handler. 
The front of the FQ was accessed by executing a RDPR instruction 
with rs1 = 15. 


On UltraSPARC Architecture implementations, all floating-point 
traps are precise. When one occurs, the address of a trapping 
instruction can be found by the trap handler in the TPC[TL], so no 
floating-point queue (FQ) is needed or implemented (impl. dep. 
#25-V8) and RDPR with rs1 = 15 generates an illegal instruction 
exception. 


privileged opcode 


See Also RDasr on page 303 
RDHPR on page 306 
WRPR on page 382 
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7.77 | RESTORE 


Instruction op3 Operation Assembly Language Syntax Class 


RESTORE 11 1101 Restore Caller's Window restore Tegrsir reg or imm, Tegra A1 





mo A: e 
THO 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The RESTORE instruction restores the register window saved by the last SAVE 
instruction executed by the current process. The in registers of the old window 
become the out registers of the new window. The in and local registers in the new 
window contain the previous values. 


Furthermore, if and only if a fill trap is not generated, RESTORE behaves like a 
normal ADD instruction, except that the source operands R[rs1] or R[rs2] are read 
from the old window (that is, the window addressed by the original CWP) and the 
sum is written into R[rd] of the new window (that is, the window addressed by the 
new CWP). 


Note | CWP arithmetic is performed modulo the number of implemented 
windows, N REG WINDOWS. 


Programming | Typically, if a RESTORE instruction traps, the fill trap handler 

Notes | returns to the trapped instruction to reexecute it. So, although the 
ADD operation is not performed the first time (when the 
instruction traps), it is performed the second time the instruction 
executes. The same applies to changing the CWP. 


There is a performance trade-off to consider between using SAVE/ 
RESTORE and saving and restoring selected registers explicitly. 





Description (Effect on Privileged State) 
If a RESTORE instruction does not trap, it decrements the CWP (mod 
N REG WINDOWS) to restore the register window that was in use prior to the last 
SAVE instruction executed by the current process. It also updates the state of the 
register windows by decrementing CANRESTORE and incrementing CANSAVE. 
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If the register window to be restored has been spilled (CANRESTORE = 0), then a 
fill trap is generated. The trap vector for the fill trap is based on the values of 
OTHERWIN and WSTATE, as described in Trap Type for Spi Il/Fill Traps on page 485. 
The fill trap handler is invoked with CWP set to point to the window to be filled, 
that is, old CWP - 1. 


Programming | The vectoring of fill traps can be controlled by setting the value of 

Note the OTHERWIN and WSTATE registers appropriately. For details, 
see the section "Splitting the Register Windows" in Software 
Considerations, contained in the separate volume UltraSPARC 
Architecture Application Notes. 


The fill handler normally will end with a RESTORED instruction 
followed by a RETRY instruction. 





An attempt to execute a RESTORE instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 
fill n normal (n = 0-7) 
fill n other (n = 0-7) 


See Also SAVE on page 317 
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7.78 RESTORED 





Instruction Operation Assembly Language Syntax Class 

RESTORED? Window has been restored restored A1 
e-ooor[ "oc TO 
31 30 29 25 24 19 18 0 


Description RESTORED adjusts the state of the register-windows control registers. 
RESTORED increments CANRESTORE. 
If CLEANWIN < (N. REG WINDOWS-1), then RESTORED increments CLEANWIN. 


If OTHERWIN = 0, RESTORED decrements CANSAVE. If OTHERWIN £0, it 
decrements OTHERWIN. 


Programming | Trap handler software for register window fills use the 

Notes | RESTORED instruction to indicate that a window has been filled 
successfully. For details, see the section "Example Code for Spill 
Handler" in Software Considerations, contained in the separate 
volume UltraSPARC Architecture Application Notes. 


Normal privileged software would probably not execute a 
RESTORED instruction from trap level zero (TL = 0). However, it 
is not illegal to do so and doing so does not cause a trap. 


Executing a RESTORED instruction outside of a window fill trap 
handler is likely to create an inconsistent window state. Hardware 
will not signal an exception, however, since maintaining a 
consistent window state is the responsibility of privileged 
software. 





If CANSAVE = 0 or CANRESTORE > (N REG WINDOWS — 2) just prior to execution of 
a RESTORED instruction, the subsequent behavior of the processor is undefined. In 
neither of these cases can RESTORED generate a register window state that is both 

valid (see Register Window State Definition on page 89) and consistent with the state 

prior to the RESTORED. 


An attempt to execute a RESTORED instruction when instruction bits 18:0 are 
nonzero causes an illegal instruction exception. 


An attempt to execute a RESTORED instruction in nonprivileged mode (PSTATE.priv 
= 0 and HSTATE.hpriv = 0) causes a privileged opcode exception. 
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Exceptions illegal instruction 
privileged opcode 


See Also ALLCLEAN on page 150 
INVALW on page 240 
NORMALW on page 289 
OTHERW on page 291 
SAVED on page 319 
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7.79 RETRY 


Instruction op3 Operation Assembly Language Syntax Class 


RETRY? 111110 Return from Trap (retry trapped instruction) retry A1 


EE NN NCC NN 


31 30 29 25 24 19 18 0 





Description The RETRY instruction restores the saved state from TSTATE[TL] (GL, CCR, ASI, 
PSTATE, and CWP), HTSTATE[TL] (HPSTATE), sets PC and NPC, and decrements 
TL. RETRY sets PC — TPC[TL] and NPC — TNPC[TL] (normally, the values of PC and 
NPC saved at the time of the original trap). 


Programming | The DONE and RETRY instructions are used to return from 
Note | privileged trap handlers. 


If the saved TPC[TL] and TNPC[TL] were not altered by trap handler software, 
RETRY causes execution to resume at the instruction that originally caused the trap 
(“retrying” it). 


Execution of a RETRY instruction in the delay slot of a control-transfer instruction 
produces undefined results. 


When a RETRY instruction is executed in privileged mode and 
HTSTATE[TL].hpstate.hpriv = 0 (which will cause the RETRY to return the virtual 
processor to nonprivileged or privileged mode), the value of GL restored from 
TSTATE[TL] saturates at MAXPGL. That is, if the value in TSTATE[TL].gl is greater 
than MAXPGL, then MAXPGL is substituted and written to GL. This protects against 
non-hyperprivileged software executing with GL » MAXPGL. 


If software writes invalid or inconsistent state to TSTATE or HTSTATE before 
executing RETRY, virtual processor behavior during and after execution of the 
RETRY instruction is undefined. 


The RETRY instruction does not provide an error barrier, as MEMBAR #Sync does 
(impl. dep. #215-U3). 


When PSTATE.am = 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system. 


IMPL. DEP. #417-S10: If (1) TSTATE[TL].pstate.am = 1 and (2) a RETRY instruction 
is executed (which sets PSTATE.am to '1' by restoring the value from 
TSTATE[TL].pstate.am to PSTATE.am), it is implementation dependent whether the 
RETRY instruction masks (zeroes) the more-significant 32 bits of the values it places 
into PC and NPC. 
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Exceptions. An attempt to execute the RETRY instruction when the following 

condition is true causes an illegal instruction exception: 

m TL - 0 and the virtual processor is in privileged mode or hyperprivileged mode 
(PSTATE.priv = 1 or HPSTATE.hpriv = 1) 


An attempt to execute a RETRY instruction in nonprivileged mode (PSTATE.priv = 0 
and HPSTATE.hpriv = 0) causes a privileged opcode exception. 


Implementation | In nonprivileged mode, illegal instruction exception due to TL = 0 
Note | does not occur. The privileged opcode exception occurs instead, 
regardless of the current trap level (TL). 


A irap level zero disrupting trap can occur upon the completion of a RETRY 
instruction, if the following three conditions are true after RETRY has executed: 
a trap level zero exceptions are enabled (HPSTATE.tlz = 1), 
a the virtual processor is in nonprivileged or privileged mode 
(HPSTATE.hpriv = 0), and 
a the trap level (TL) register's value is zero (TL = 0) 


Exceptions illegal instruction 
privileged opcode 


trap level zero 


See Also DONE on page 168 
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7.80 


Description 


RETURN 


RETURN 


Instruction op3 Operation Assembly Language Syntax Class 


RETURN 11 1001 Return return address A1 





25 24 


19 18 14 13 12 5 4 0 


The RETURN instruction causes a delayed transfer of control to the target address 
and has the window semantics of a RESTORE instruction; that is, it restores the 
register window prior to the last SAVE instruction. The target address is 

^R[rs1] + R[rs2]" if i = 0, or "R[rs1] + sign ext (simm13)" if i = 1. Registers R[rs1] 
and R[rs2] come from the old window. 


Like other DCTIs, all effects of RETURN (including modification of CWP) are visible 
prior to execution of the delay slot instruction. 


Programming 
Note 


Programming 
Note 





To reexecute the trapped instruction when returning from a user trap 
handler, use the RETURN instruction in the delay slot of a JMPL 
instruction, for example: 


jmpl$16,5$g0 
return£1l7 


Trapped PC supplied to user trap handler 
Trapped NPC supplied to user trap handler 





A routine that uses a register window may be structured either as: 


save $sp,-framesize, $sp 
ret ! Same as jmpl $i7 +8, %g0 
restore ! Something useful like "restore 
! $02,$12, $00" 
Or as: 


save $sp, —framesize, $sp 


return $i7 +8 
nop ! Could do some useful work in the 
!caller's window, e.g., "or $01, $02,$00" 


An attempt to execute a RETURN instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


A RETURN instruction may cause a window fill exception as part of its RESTORE 


semantics. 


When PSTATE.am = 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system. 
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A RETURN instruction causes a mem address not aligned exception if either of the 
two least-significant bits of the target address is nonzero. 


Exceptions illegal instruction 
fill n normal (n = 0-7) 
fill n other (n = 0—7) 
mem address not aligned 
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SAVE 





7.81 SAVE 


Instruction op3 Operation Assembly Language Syntax Class 


SAVE 11 1100 Save Caller's Window save Tégrg1, TE€g Or imm, Te&rg A1 





DC A PE N e a SNT 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The SAVE instruction provides the routine executing it with a new register window. 
The out registers from the old window become the in registers of the new window. 
The contents of the out and the local registers in the new window are zero or contain 
values from the executing process; that is, the process sees a clean window. 


Furthermore, if and only if a spill trap is not generated, SAVE behaves like a normal 
ADD instruction, except that the source operands R[rs1] or R[rs2] are read from the 
old window (that is, the window addressed by the original CWP) and the sum is 
written into R[rd] of the new window (that is, the window addressed by the new 
CWP). 


Note | CWP arithmetic is performed modulo the number of implemented 
windows, N REG WINDOWS. 


Programming | Typically, if a SAVE instruction traps, the spill trap handler returns 

Notes | to the trapped instruction to reexecute it. So, although the ADD 
operation is not performed the first time (when the instruction 
traps), it is performed the second time the instruction executes. 
The same applies to changing the CWP. 


The SAVE instruction can be used to atomically allocate a new 
window in the register file and a new software stack frame in 
memory. For details, see the section "Leaf-Procedure 
Optimization" in Software Considerations, contained in the 
separate volume UltraSPARC Architecture Application Notes. 


There is a performance trade-off to consider between using SAVE/ 
RESTORE and saving and restoring selected registers explicitly. 





Description (Effect on Privileged State) 
If a SAVE instruction does not trap, it increments the CWP (mod N REG WINDOWS) 
to provide a new register window and updates the state of the register windows by 
decrementing CANSAVE and incrementing CANRESTORE. 
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SAVE 


If the new register window is occupied (that is, CANSAVE = 0), a spill trap is 
generated. The trap vector for the spill trap is based on the value of OTHERWIN and 
WSTATE. The spill trap handler is invoked with the CWP set to point to the window 
to be spilled (that is, old CWP +2). 


An attempt to execute a SAVE instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If CANSAVE z 0, the SAVE instruction checks whether the new window needs to be 


cleaned. It causes 


a clean window trap if the number of unused clean windows is 


zero, that is, (CLEANWIN - CANRESTORE) = 0. The clean window trap handler is 
invoked with the CWP set to point to the window to be cleaned (that is, old 


CWP 4 1). 


Programming 
Note 





Exceptions illegal instruction 


The vectoring of spill traps can be controlled by setting the value 
of the OTHERWIN and WSTATE registers appropriately. For 
details, see the section “Splitting the Register Windows" in 
Software Considerations, contained in the separate volume 
UltraSPARC Architecture Application Notes. 


The spill handler normally will end with a SAVED instruction 
followed by a RETRY instruction. 


spill n normal (n = 0-7) 
spill n other (n = 0-7) 


clean window 


See Also RESTORE on page 309 
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SAVED 





7.82 SAVED 





Instruction Operation Assembly Language Syntax Class 

SAVED? Window has been saved saved A1 
eoo [ cw [ ——  — 
31 30 29 25 24 19 18 0 


Description SAVED adjusts the state of the register-windows control registers. 


SAVED increments CANSAVE. If OTHERWIN = 0, SAVED decrements 
CANRESTORE. If OTHERWIN = 0, it decrements OTHERWIN. 


Programming | Trap handler software for register window spills uses the SAVED 

Notes | instruction to indicate that a window has been spilled 
successfully. For details, see the section "Example Code for Spill 
Handler" in Software Considerations, contained in the separate 
volume UltraSPARC Architecture Application Notes. 


Normal privileged software would probably not execute a SAVED 
instruction from trap level zero (TL = 0). However, it is not illegal 
to do so and doing so does not cause a trap. 


Executing a SAVED instruction outside of a window spill trap 
handler is likely to create an inconsistent window state. Hardware 
will not signal an exception, however, since maintaining a 
consistent window state is the responsibility of privileged 
software. 





If CANSAVE > (N REG WINDOWS — 2) or CANRESTORE = 0 just prior to execution of 
a SAVED instruction, the subsequent behavior of the processor is undefined. In 
neither of these cases can SAVED generate a register window state that is both valid 
(see Register Window State Definition on page 89) and consistent with the state prior to 
the SAVED. 


An attempt to execute a SAVED instruction when instruction bits 18:0 are nonzero 
causes an illegal instruction exception. 


An attempt to execute a SAVED instruction in nonprivileged mode (PSTATE.priv = 0 
and HSTATE.hpriv = 0) causes a privileged opcode exception. 


Exceptions illegal instruction 
privileged opcode 
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See Also ALLCLEAN on page 150 
INVALW on page 240 
NORMALW on page 289 
OTHERW on page 291 
RESTORED on page 311 


320 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


SDIV, SDIVcc (Deprecated) 





7.83 


Signed Divide (64-bit + 32-bit) 


The SDIV and SDIVcc instructions are deprecated and should not be used in new 
software. The SDIVX instruction should be used instead. 





Opcode op3 Operation Assembly Language Syntax Class 
SDIVP 001111 Signed Integer Divide sdiv ref rg 4, eg. or. imm, regra D2 
SDIVcc? 011111 Signed Integer Divide and modify cc's sdivcc  fegyg,, reg or imm, TESrd D2 





m a Im 


Description 


25 24 19 18 14 18 12 5 4 0 


The signed divide instructions perform 64-bit by 32-bit division, producing a 32-bit 
result. If i = 0, they compute "(Y :: R[rs1]{31:0}) + R[rs2](31:0]". Otherwise (that is, if 
i = 1), the divide instructions compute "(Y :: R[rs1](31:0]) + 

(sign. ext(simm13)(31:0])". In either case, if overflow does not occur, the less 
significant 32 bits of the integer quotient are sign- or zero-extended to 64 bits and are 
written into R[rd]. 


The contents of the Y register are undefined after any 64-bit by 32-bit integer divide 
operation. 


Signed Divide Signed divide (SDIV, SDIVcc) assumes a signed integer doubleword dividend 


(Y :: lower 32 bits of R[rs1]) and a signed integer word divisor (lower 32 bits of 
R[rs2] or lower 32 bits of sign ext(simm13)) and computes a signed integer word 
quotient (R[rd]). 


Signed division rounds an inexact quotient toward zero. For example, -7 + 4 equals 
the rational quotient of —1.75, which rounds to -1 (not 2) when rounding toward 
Zero. 


The result of a signed divide can overflow the low-order 32 bits of the destination 
register R[rd] under certain conditions. When overflow occurs, the largest 
appropriate signed integer is returned as the quotient in R[rd]. The conditions under 
which overflow occurs and the value returned in R[rd] under those conditions are 
specified in TABLE 7-13. 
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Exceptions 


See Also 


SDIV, SDIVcc (Deprecated) 


TABLE 7-13 SDIV / SDIVcc Overflow Detection and Value Returned 








Condition Under Which Overflow Occurs Value Returned in R[rd] 
Rational quotient > 2?! 25! —1 (0000 0000 7FFF FFFF16) 
Rational quotient < —2?! - 1 -231 (FFFF FFFF 8000 000016) 


When no overflow occurs, the 32-bit result is sign-extended to 64 bits and written 
into register R[rd]. 


SDIV does not affect the condition code bits. SDIVcc writes the integer condition 
code bits as shown in the following table. Note that negative (N) and zero (Z) are set 
according to the value of R[rd] after it has been set to reflect overflow, if any. 





Bit Effect on bit of SDIVcc instruction 

icc.n Set to 1 if R[rd]{31} = 1; otherwise, set to 0 

icc.z Set to 1 if R[rd]{31:0} = 0; otherwise, set to 0 

icc.v Set to 1 if overflow (per TABLE 7-12); otherwise set to 0 
icc.c Set to 0 

Xcc.n Set to 1 if R[rd]{63} = 1; otherwise, set to 0 

XCC.Z Set to 1 if R[rd]{63:0} = 0; otherwise, set to 0 

XCC.V Set to 0 

XCC.C Set to 0 





An attempt to execute an SDIV or SDIVcc instruction when i = 0 and instruction bits 
12:5 are nonzero causes an illegal_instruction exception. 


illegal instruction 
division by zero 


MULScc on page 285 
RDY on page 303 
UDIV[cc] on page 372 
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SETHI 





7.84 SETHI 


Instruction  op2 Operation Assembly Language Syntax Class 


SETHI 100 Set High 22 Bits of Low Word ^ sethi const22, reg M 
sethi %hi (value), reg 





fot EI UE] 


31 30 29 25 24 22 21 0 


Description SETHI zeroes the least significant 10 bits and the most significant 32 bits of R[rd] and 
replaces bits 31 through 10 of R[rd] with the value from its imm22 field. 


SETHI does not affect the condition codes. 


Some SETHI instructions with rd = 0 have special uses: 
m rd = 0 and imm22 = 0: defined to be a NOP instruction (described in No Operation) 


m rd = 0 and imm22 + 0 may be used to trigger hardware performance counters in 
some UltraSPARC Architecture implementations (for details, see implementation- 
specific documentation). 


Programming | The most common form of 64-bit constant generation is creating 
Note | stack offsets whose magnitude is less than 22. The code below can 

be used to create the constant 0000 0000 ABCD 123446: 

sethi Shi (Oxabcd1234) ,%00 

or $00, 0x234, $00 
The following code shows how to create a negative constant. Note: 
The immediate field of the xor instruction is sign extended and can 
be used to place 1's in all of the upper 32 bits. For example, to set the 
negative constant FFFF FFFF ABCD 123446: 


sethi Shi (0x5432edcb),%00! note 0x5432EDCB, not OxABCD1234 
xor $00, Oxle34, $00! part of imm. overlaps upper bits 


Exceptions None 
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SHUTDOWN (Deprecated) 





7.85 SHUTDOWN 


The SHUTDOWN instruction is deprecated and should not be used in new 
software. 


Instruction opf Operation Assembly Language Syntax Class 


SHUTDOWNP/ 0 1000 0000 Enter low-power mode shutdown D3 








pep. 4 unm qo oem s qo 0 op 0 


31 30 29 25 24 19 18 14 13 5 4 0 


Description SHUTDOWN is a deprecated, privileged instruction that was used in early 
UltraSPARC implementations to bring the virtual processor or its containing system 
into a low-power state in an orderly manner. It had no effect on software-visible 
virtual processor state. 


On an UltraSPARC Architecture implementation operating in privileged or 
hyperprivileged mode, SHUTDOWN behaves like a NOP (impl. dep. #206-U3-Cs10). 


In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and its effect is 
emulated in software. 


Exceptions illegal instruction (instruction not implemented in hardware) 
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SIAM 





7.86 Set Interval Arithmetic Mode 


Instruction opf Operation Assembly Language Syntax Class 


SIAM 0 1000 0001 Set the interval arithmetic mode fields in the GSR siam siam_mode B1 





10 110110 mode 
RE) 


31 30 29 25 24 19 18 14 13 5 4 3 2 0 


Description The SIAM instruction sets the GSR.im and GSR.irnd fields as follows: 
GSR.im + mode(2] 
GSR.irnd + mode{1:0} 
Note | When GSR.im is set to 1, all subsequent floating-point 
instructions requiring round mode settings derive rounding- 


mode information from the General Status Register (GSR.irnd) 
instead of the Floating-Point State Register (FSR.rd). 


Note | When GSR.im - 1, the processor operates in standard floating- 
point mode regardless of the setting of FSR.ns. 





An attempt to execute a SIAM instruction when instruction bits 29:25, 18:14, or 4:3 
are nonzero causes an /llegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute a SIAM instruction causes an fp disabled exception. 


Exceptions illegal instruction 
fo disabled 
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SIR 





7.87 Software-Initiated Reset 


Instruction op3 rd Operation Assembly Language Syntax Class 


SIRH 11 0000 15 Software-Initiated Reset sir simm13 A2 


on 70000 
25 24 19 18 0 


31 30 29 14 13 12 





Description SIR is a hyperprivileged instruction, used to generate a software-initiated reset (SIR). 
As with other traps, a software-initiated reset performs different actions when 
TL = MAXTL than it does when TL« MAXTL. 


See Software-Initiated Reset (SIR) Traps on page 495 and Software-Initiated Reset (SIR) 
on page 566 for more information about software-initiated resets. 


When executed in nonprivileged or privileged mode (HPSTATE.hpriv = 0), SIR 
causes an illegal instruction exception (impl. dep. #116-V9). 


Implementation | The SIR instruction shares an opcode with WRasr; they are 
Notes | distinguished by the rd, rs1, and i fields (rd = 15,rs1 = 0, andi z 1 
for SIR). 


An instruction that uses the WRasr opcode (op1 = 105, 

op3 - 11 00005) with i 2 1 is not an SIR instruction; see Write 
Ancillary State Register on page 376 for specification of its 
behavior. 





Exceptions software initiated reset 
illegal instruction 


See Also WRasr on page 376 
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SLL / SRL / SRA 





7.88 


Shift 














Instruction op3 x Operation Assembly Language Syntax Class 
SLL 10 0101 0 Shift Left Logical — 32 bits sll reg, Teg or shcnt, regra A1 
SRL 10 0110 0 Shift Right Logical — 32 bits srl reg, Teg or shcnt, regra A1 
SRA 100111 0 Shift Right Arithmetic- 32 bits sra Tegyg1, leg or shcnt, reg;g A1 
SLLX 10 0101 1 Shift Left Logical — 64 bits Sllx regn, reg or shcnt, reg, A1 
SRLX 10 0110 1 Shift Right Logical — 64 bits srlx regn, reg or shcnt, reg, A1 
SRAX 10 0111 1 Shift Right Arithmetic — 64 bits srax  feSrsir reg Or shcnt, regrg A1 

10 rd op3 rst i=0| x — rs2 

10 rd op3 rs] i=1x=0 — shcnt32 

10 rd op3 rs] i=1x=1 — shcnt64 

31 30 29 25 24 19 18 14 13 12 6 5 4 0 

Description These instructions perform logical or arithmetic shift operations. 


When i = 0 and x = 0, the shift count is the least significant five bits of R[rs2]. 
When i = 0 and x = 1, the shift count is the least significant six bits of R[rs2]. When 
i = 1 and x = 0, the shift count is the immediate value specified in bits 0 through 4 of 
the instruction. 

When i = 1 and x = 1, the shift count is the immediate value specified in bits 0 
through 5 of the instruction. 


TABLE 7-14 shows the shift count encodings for all values of i and x. 


TABLE 7-14 Shift Count Encodings 





i x Shift Count 
bits 4-0 of R[rs2] 
bits 5-0 of R[rs2] 


0 
0 
1 bits 4-0 of instruction 
1 


0 
1 
0 
i 


bits 5-0 of instruction 





SLL and SLLX shift all 64 bits of the value in R[rs1] left by the number of bits 
specified by the shift count, replacing the vacated positions with zeroes, and write 
the shifted result to R[rd]. 
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Exceptions 


SLL / SRL / SRA 


SRL shifts the low 32 bits of the value in Rfrs1] right by the number of bits specified 
by the shift count. Zeroes are shifted into bit 31. The upper 32 bits are set to zero, 
and the result is written to R[rd]. 


SRLX shifts all 64 bits of the value in R[rs1] right by the number of bits specified by 
the shift count. Zeroes are shifted into the vacated high-order bit positions, and the 
shifted result is written to R[rd]. 


SRA shifts the low 32 bits of the value in R[rs1] right by the number of bits specified 
by the shift count and replaces the vacated positions with bit 31 of R[rs1]. The high- 
order 32 bits of the result are all set with bit 31 of R[rs1], and the result is written to 
R[rd]. 


SRAX shifts all 64 bits of the value in R[rs1] right by the number of bits specified by 
the shift count and replaces the vacated positions with bit 63 of R[rs1]. The shifted 
result is written to R[rd]. 


No shift occurs when the shift count is 0, but the high-order bits are affected by the 
32-bit shifts as noted above. 


These instructions do not modify the condition codes. 


Programming | “Arithmetic left shift by 1 (and calculate overflow)” can be 
Notes | effected with the ADDcc instruction. 


The instruction "sra reg,s1, 0,reg,q' can be used to convert a 32- 
bit value to 64 bits, with sign extension into the upper word. “srl 
Tegrg1, 0, regra” can be used to clear the upper 32 bits of R[rd]. 


An attempt to execute a SLL, SRL, or SRA instruction when instruction bits 11:5 are 
nonzero causes an illegal instruction exception. 


An attempt to execute a SLLX, SRLX, or SRAX instruction when either of the 
following conditions exist causes an illegal instruction exception: 


m i=0or X=0 and instruction bits 11:5 are nonzero 
m X- 1 and instruction bits 11:6 are nonzero 


illegal instruction 
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SMUL, SMULcc (Deprecated) 





7.89 Signed Multiply (32-bit) 


The SMUL and SMULcc instructions are deprecated and should not be used in 
new software. The MULX instruction should be used instead. 


Opcode op3 Operation Assembly Language Syntax Class 





SMULP 001011 Signed Integer Multiply smul Tegrg1, leg OT imm, Tegyg D2 
SMULccP 011011 Signed Integer Multiply and modify cc's smulcc Fegysir leg or imm, regyg D2 





mI Io Tee - | = 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The signed multiply instructions perform 32-bit by 32-bit multiplications, producing 
64-bit results. They compute “R[rs1]{31:0} x R[rs2](31:0]" if i = 0, or “R[rs1]{31:0} x 
sign ext (simm13)(31:0]" if i = 1. They write the 32 most significant bits of the 
product into the Y register and all 64 bits of the product into R[rd]. 


Signed multiply instructions (SMUL, SMULcc) operate on signed integer word 
operands and compute a signed integer doubleword product. 


SMUL does not affect the condition code bits. SMULcc writes the integer condition 
code bits, icc and xcc, as shown below. 


Bit Effect on bit by execution of SMULcc 

icc.n Set to 1 if product{31} = 1; otherwise, set to 0 
icc.z Set to 1 if product{31:0}= 0; otherwise, set to 0 
icc.v Set to 0 

icc.c Set to 0 

Xcc.n Set to 1 if product{63} = 1; otherwise, set to 0 
XCC.Z Set to 1 if product{63:0} = 0; otherwise, set to 0 
XCC.V Set to 0 

XCC.C Set to 0 





Note | 32-bit negative (icc.n) and zero (icc.z) condition codes are set 
according to the less significant word of the product, not 
according to the full 64-bit result. 
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SMUL, SMULcc (Deprecated) 


Programming | 32-bit overflow after SMUL or SMULcc is indicated by 


Notes | Y # (R[rd] >> 31), where “>>” indicates 32-bit arithmetic right- 
shift. 


An attempt to execute a SMUL or SMULcc instruction when i = 0 and instruction bits 
12:5 are nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 


See Also UMULIcc] on page 374 
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STB / STH / STW / STX 





7.90 


Store Integer 


Instruction op3 Operation Assembly Language Syntax Class 
STB 00 0101 Store Byte stb? reg, Laddress] A1 
STH 00 0110 Store Halfword stht reg, Laddress] A1 
STW 00 0100 Store Word stw? reg Laddress] A1 
STX 00 1110 Store Extended Word stx reg [address] A1 

t synonyms: stub, stsb i synonyms: stuh, stsh ? synonyms: st, stuw, stsw 


D A PE T e a aeS 


31 30 29 


Description 


Exceptions 


See Also 


25 24 19 18 14 13 12 5 4 0 


The store integer instructions (except store doubleword) copy the whole extended 
(64-bit) integer, the less significant word, the least significant halfword, or the least 
significant byte of R[rd] into memory. 


These instructions access memory using the implicit ASI (see page 104). The effective 
address for these instructions is "R[rs1] + R[rs2]" if i = 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


A successful store (notably, STX) integer instruction operates atomically. 


An attempt to execute a store integer instruction when i = 0 and instruction bits 12:5 
are nonzero causes an /llegal instruction exception. 


STH causes a mem address not aligned exception if the effective address is not 
halfword-aligned. STW causes a mem address not aligned exception if the effective 
address is not word-aligned. STX causes a mem address not aligned exception if 
the effective address is not doubleword-aligned. 


illegal instruction 

mem address not aligned 
VA watchpoint 

data access MMU error 


STTW on page 352 
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STBA / STHA / STWA / STXA 





7.91 Store Integer into Alternate Space 











Instruction op3 Operation Assembly Language Syntax Class 
STBAPASI 010101 Store Byte into Alternate Space stbat reg, [regaddr] imm asi A1 
stba feg, [reg plus imm] $asi 
STHAP^s 010110 Store Halfword into Alternate Space stha?  regyg [regaddr] imm asi A1 
stha feg, [reg plus imm] $asi 
STWAP^s 010100 Store Word into Alternate Space stwa? reg,g [regaddr] imm asi A1 
stwa fegra, [reg plus imm] $asi 
STXAPASI 01 1110 Store Extended Word into Alternate stxa reg, [regaddr] imm asi A1 
Space stxa reg lreg plus imm] Sasi 
* synonyms: stuba, stsba t synonyms: stuha, stsha ? synonyms: sta, stuwa, st swa 


HT I I5 FL wu I5 
ond 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The store integer into alternate space instructions copy the whole extended (64-bit) 
integer, the less significant word, the least significant halfword, or the least 
significant byte of R[rd] into memory. 


Store integer to alternate space instructions contain the address space identifier (ASI) 
to be used for the store in the imm asi field if i = 0, or in the ASI register if i = 1. The 
access is privileged if bit 7 of the ASI is 0; otherwise, it is not privileged. The 
effective address for these instructions is “R[rs1] + R[rs2]" if i 2 0, or 

^R[rs1]-* sign. ext (simm13)" if i = 1. 


A successful store (notably, STXA) instruction operates atomically. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0, these instructions cause a privileged action exception. In privileged mode 
(PSTATE.priv = 1 and HPSTATE.hpriv = 0), if the ASI is in the range 3046 to 7F56, 
these instructions cause a privileged action exception. 


STHA causes a mem adaress not aligned exception if the effective address is not 
halfword-aligned. STWA causes a mem address not aligned exception if the 
effective address is not word-aligned. STXA causes a mem address not aligned 
exception if the effective address is not doubleword-aligned. 
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STBA, STHA, and STWA can be used with any of the following ASls, subject to the 
privilege mode rules described for the privileged action exception above. Use of any 
other ASI with these instructions causes a data access exception exception. 





AS 
AS 


ASI 
ASI 
ASI 
ASI 
ASI 


AS 
AS 





[L AS. 
[L AS. 


NUC 
—AS_ 
—AS_ 


_REAI 
_REAI 


ASls valid for STBA, STHA, and STWA 


IF_PRIV_PRIMARY 
IF_PRIV_SECONDARY 


LEUS 


IF_USER_PRIMARY 
IF_USER_SECONDARY 


L_IO 





[ PRIMARY 
[ SECONDARY 


ASI AS IF PRIV P 


RIMARY LITTLE 


ASI AS IF PRIV SECONDARY LITTLE 


ASI NUCLEUS LITTLE 


ASI AS IF USER P 





RIMARY LITTLE 





ASI AS IF USER SECONDARY LITTLE 


ASI REAL LITTLE 


ASI REAL IO LITTLE 





ASI PRIMARY LITTLE 
ASI SECONDARY LITTLE 


STXA can be used with any ASI (including, but not limited to, the above list), unless 
it either (a) violates the privilege mode rules described for the privileged action 
exception above or (b) is used with any of the following ASIs, which causes a 

dala access exception exception. 





ASIs invalid for STXA (cause data access exception exception) 





























ASI BLOCK AS IF PRIV PRIMARY 
ASI BLOCK AS IF PRIV SECONDARY 
ASI BLOCK AS IF USER PRIMARY 
ASI BLOCK AS IF USER SECONDARY 
244g (aliased to 27,6, ASI, TWINX N) 
ASI BLOCK AS. IF USER PRIMARY 
ASI BLOCK AS IF USER SECONDARY 




















244g (deprecated ASI QUAD. LDD) 


ASI PS] 
ASI PS] 
ASI PR] 


L8 PRIMARY 
r8 SECONDARY 
MARY NO FAULT 


ASI SECONDARY NO, FAULT 


ASI PS] 
ASI PS] 
ASI PS] 
ASI PS] 
ASI F 
ASI F 
ASI FL] 


L8 PRIMARY 
L8, SECONDARY 


L16 PRIMARY 
Lr16 SECONDARY 
[32 PRIMARY 
[32 SECONDARY 





6 PRIMARY 





ASI FL] 
ASI 


LOCK COM 


6 SECONDARY 
IT PRIMARY 











B 
ASI B 
ASI B 


LOCK PRI 
LOCK SECONDARY 





ARY 















































ASI BLOCK AS IF PRIV PRIMARY LITTLE 
ASI BLOCK AS IF PRIV SECONDARY LITTLE 
ASI BLOCK AS, IF USER PRIMARY LITTLE 
ASI BLOCK AS IF USER SECONDARY LITTLE 
2C,g (aliased to 2F,,, ASI. TWINX NL) 

ASI BLOCK AS IF USER PRIMARY LITTLE 
ASI BLOCK AS, IF USER SECONDARY LITTLE 
2C:6 (deprecated ASI, QUAD, LDD. L) 

ASI PST8 PRIMARY LITTLE 

ASI PST8. SECONDARY LITTLE 

ASI PRIMARY NO FAULT, LITTLE 

ASI SECONDARY NO FAULT. LITTLE 

ASI PST16 PRIMARY LITTLE 

ASI PST16 SECONDARY LITTLE 

ASI PST32 PRIMARY LITTLE 

ASI PST32 SECONDARY LITTLE 

ASI FL8, PRIMARY LITTLE 

ASI FL8 SECONDARY LITTLE 

ASI FL16 PRIMARY LITTLE 

ASI FL16 SECONDARY LITTLE 

ASI BLOCK COMMIT SECONDARY 

ASI BLOCK PRIMARY LITTLE 

ASI BLOCK SECONDARY LITTLE 
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V8 Compatibility | The SPARC V8 STA instruction was renamed STWA in the 
Note | SPARC V9 architecture. 


Exceptions mem adaress not aligned (all except STBA) 
privileged action 
VA watchpoint 


See Also LDA on page 244 
STTWA on page 354 
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7.92 Block Store 


The STBLOCKF instruction is intended to be a processor-specific instruction, 
which may or may not be implemented in future UltraSPARC Architecture 
implementations. Therefore, it should only be used in platform-specific 


dynamically-linked libraries, in hyperprivileged software, or in software created 
by a runtime code generator that is aware of the specific virtual processor 
implementation on which it is executing. 





























ASI 

Instruction Value Operation Assembly Language Syntax Class 

STBLOCKF 1646 64-byte block store to primary address stda fregar [regaddr] #ASI_BLK_AIUP A2 
space, user privilege stda /freg,g, [reg plus imm) %asi 

STBLOCKF 1736 64-byte block store to secondary address — stda freg,g, [regaddr] #ASI_BLK_AIUS A2 
space, user privilege stda /freg,g, [reg plus imm) %asi 

STBLOCKF 1E;g 64-byte block store to primary address stda freg,g, [regaddr] #ASI_BLK_AIUPL A2 
space, little-endian, user privilege stda freg,g, [reg plus imm) %asi 

STBLOCKF 1F36 64-byte block store to secondary address — stda freg;g, [regaddr] $&ASI BLK AIUSL A2 
space, little-endian, user privilege stda freg,g, [reg plus imm) %asi 

STBLOCKF F046 64-byte block store to primary address stda freg,g, [regaddr] &ASI BLK P A2 
space stda freg,g, [reg plus imm] %asi 

STBLOCKF Fly6 64-byte block store to secondary address — stda freg,g, [regaddr] $ASI BLK S A2 
space stda freg,g, [reg plus imm] %asi 

STBLOCKF F846 64-byte block store to primary address stda freg,g, [regaddr] #ASI_BLK_PL A2 
space, little-endian stda frega, [reg plus imm) %asi 

STBLOCKF F946 64-byte block store to secondary address stda freg,g, [regaddr] #ASI_BLK_SL A2 
space, little-endian stda freg;g, [reg plus imm) %asi 








nom CNN NL CNN RN 
nom 


31 30 29 25 24 19 18 14 13 5 4 0 


Description A block store instruction references one of several special block-transfer ASIs. Block- 
transfer ASIs allow block stores to be performed accessing the same address space as 
normal stores. Little-endian ASIs (those with an ‘L’ suffix) access data in little-endian 
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format; otherwise, the access is assumed to be big-endian. Byte swapping is 
performed separately for each of the eight double-precision registers accessed by the 
instruction. 
Programming | The block store instruction, STBLOCKF, and its companion, 
Note | LDBLOCKE, were originally defined to provide a fast 
mechanism for block-copy operations. 


STBLOCKF stores data from the eight double-precision floating-point registers 
specified by rd to a 64-byte-aligned memory area. The lowest-addressed eight bytes 
in memory are stored from the lowest-numbered double-precision rd. 


While a STBLOCKF operation is in progress, any of the following values may be 
observed in a destination doubleword memory locations: (1) the old data value, (2) 
zero, or (3) the new data value. When the operation is complete, only the new data 
values will be seen. 
Compatibility | Software written for older UltraSPARC implementations 
Note | that reads data being written by STBLOCKF instructions 
may or may not allow for case (2) above. Such software 
should be checked to verify that either it always waits 
for STBLOCKF to complete before reading the values 
written, or that it will operate correctly if an intermediate 
value of zero (not the "old" or "new" data values) is 
observed while the STBLOCKF operation is in progress. 


A Block Store only guarantees atomicity for each 64-bit (8-byte) portion of the 64 
bytes that it stores. 


Software should assume the following (where “load operation" includes load, load- 
store, and LDBLOCKF instructions and "store operation" includes store, load-store, 
and STBLOCKTF instructions): 


m À STBLOCKF does not follow memory ordering with respect to earlier or later 
load operations. If there is overlap between the addresses of destination memory 
locations of a STBLOCKF and the source address of a later load operation, the 
load operation may receive incorrect data. Therefore, if ordering with respect to 
later load operations is important, a MEMBAR #StoreLoad instruction must be 
executed between the STBLOCKF and subsequent load operations. 


m A STBLOCKF does not follow memory ordering with respect to earlier or later 
store operations. Those instructions' data may commit to memory in a different 
order from the one in which those instructions were issued. Therefore, if ordering 
with respect to later store operations is important, a MEMBAR #StoreStore 
instruction must be executed between the STBLOCKF and subsequent store 
operations. 


m STBLOCKFs do not follow register dependency interlocks, as do ordinary stores. 
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Programming | STBLOCKF is intended to be a processor-specific instruction (see 
Note | the warning at the top of page 335). If STBLOCKF must be used 
in software intended to be portable across current and previous 
processor implementations, then it must be coded to work in the 
face of any implementation variation that is permitted by 
implementation dependency 5411-510, described below. 


IMPL. DEP. #411-S10: The following aspects of the behavior of the block store 

(STBLOCKF) instruction are implementation dependent: 

m The memory ordering model that STBLOCKF follows (other than as constrained 
by the rules outlined above). 

m Whether VA watchpoint exceptions are recognized on accesses to all 64 bytes of 
the STBLOCKF (the recommended behavior), or only on accesses to the first eight 
bytes. 

m Whether STBLOCKFs to non-cacheable (TTE.cp = 0) pages execute in strict 
program order or not. If not, a STBLOCKF to a non-cacheable page causes an 
illegal instruction exception. 

m Whether STBLOCKF follows register dependency interlocks (as ordinary stores 
do). 

m Whether a STBLOCKF forces the data to be written to memory and invalidates 
copies in all caches present. 

m Any other restrictions on the behavior of STBLOCKF, as described in 
implementation-specific documentation. 


Exceptions. An illegal instruction exception occurs if the source floating-point 
registers are not aligned on an eight-register boundary. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute a STBLOCKF instruction causes an fp disabled exception. 


If the least significant 6 bits of the memory address are not all zero, a 
mem address not aligned exception occurs. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0 (ASIs 1646, 1716, 1E16, and 1F45), STBLOCKF causes a privileged action 
exception. 


An access caused by STBLOCKF may trigger a VA watchpoint exception (impl. dep. 
#411-S10). 


Implementation | STBLOCKF shares an opcode with the STDFA, STPARTIALF, 
Note | and STSHORTF instructions; it is distinguished by the ASI used. 
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Exceptions illegal instruction 
mem address not aligned 
privileged action 
VA watchpoint (impl. dep. #411-S10) 


See Also LDBLOCKF on page 247 
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7.93 


Store Floating-Point 


Instruction  op3 rd Operation Assembly Language Class 
STF 10 0100 0-31 Store Floating-Point register st fregrar [address] A1 
STDF 100111 t Store Double Floating-Point register std frega, [address | A1 
STOF 10 0110 $ Store Quad Floating-Point register — stq fregrar [address] C3 





* Encoded floating-point register value, as described on page 51. 


DE RE A A A ES 


spi 


31 30 29 


Description 


25 24 19 18 14 13 12 5 4 0 


The store single floating-point instruction (STF) copies the contents of the 32-bit 
floating-point register Fs [rd] into memory. 


The store double floating-point instruction (STDF) copies the contents of 64-bit 
floating-point register Fp[rd] into a word-aligned doubleword in memory. The unit 
of atomicity for STDF is 4 bytes (one word). 


The store quad floating-point instruction (STQF) copies the contents of 128-bit 
floating-point register FAlrd] into a word-aligned quadword in memory. The unit of 
atomicity for STOF is 4 bytes (one word). 


These instruction access memory using the implicit ASI (see page 104). The effective 
address for these instructions is "R[rs1] + R[rs2]" if i 2 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


Exceptions. An attempt to execute a STF or STDF instruction when i = 0 and 
instruction bits 12:5 are nonzero causes an illegal instruction exception. 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STF or STDF instruction causes an 
fp disabled exception. 


STF causes a mem adaress not aligned exception if the effective memory address is 
not word-aligned. 


STDF requires only word alignment in memory. However, if the effective address is 
word-aligned but not doubleword-aligned, an attempt to execute an STDF 
instruction causes an STDF mem address not aligned exception. In this case, trap 
handler software must emulate the STDF instruction and return (impl. dep. #110-V9- 
Cs10(a)). 
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Exceptions 


See Also 


STF / STDF / STQF / STXFSR 


STOF requires only word alignment in memory. If the effective address is word- 
aligned but not quadword-aligned, an attempt to execute an STOF instruction causes 
an STQF mem adaress not aligned exception. In this case, trap handler software 
must emulate the STOF instruction and return (impl. dep. #112-V9-Cs10(a)). 





Programming | Some compilers issued sequences of single-precision stores for 
Note | SPARC V8 processor targets when the compiler could not 
determine whether doubleword or quadword operands were 
properly aligned. For SPARC V9, since emulation of misaligned 
stores is expected to be fast, compilers should issue sets of single- 
precision stores only when they can determine that double- or 
quadword operands are rot properly aligned. 


An attempt to execute an STOF instruction when rd{1} + 0 causes an 
fp exception other (FSR.ftt = invalid fp register) exception. 


Implementation | Since UltraSPARC Architecture 2005 processors do not implement 
Note | in hardware instructions (including STQF) that refer to quad- 
precision floating-point registers, the 
STQF mem address not aligned and fp exception other (with 
FSR ftt = invalid fp register) exceptions do not occur in 
hardware. However, their effects must be emulated by software 
when the instruction causes an illegal instruction exception and 
subsequent trap. 





illegal instruction 

fo disabled 

STDF mem address not aligned 

STQF mem address not aligned (not used in UltraSPARC Architecture 2005) 
mem address not aligned 

fp exception other (FSR.ftt = invalid fp register (STOF only)) 

VA watchpoint 


Load Floating-Point Register on page 251 

Block Store on page 335 

Store Floating-Point into Alternate Space on page 341 
Store Floating-Point State Register (Lower) on page 345 
Store Short Floating-Point on page 350 

Store Partial Floating-Point on page 347 

Store Floating-Point State Register on page 357 
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7.94 


Store Floating-Pointinto Alternate Space 


Instruction op3 rd Operation Assembly Language Syntax Class 
STFAP^s 110100 0-31 Store Floating-Point Register sta freg,g, [regaddr] imm asi A1 
to Alternate Space sta fregar [reg plus imm] $asi 


sTDFAPs 11011 * Store Double Floating-Point stda fregrar 


STQFAPAS 110110 + Store Quad Floating-Point stqa freg,gr 
8 8rd 


[ 

[regaddr] imm asi A1 
Register to Alternate Space  stda  freg,g, [reg plus imm] $asi 

[ 

[ 


regaddr] imm asi C3 


Register to Alternate Space  stqa  freg;g, [reg plus imm] Sasi 





* Encoded floating-point register value, as described on page 51. 


FS IE WI 


31 30 29 


Description 


25 24 19 18 14 13 12 5 4 0 


The store single floating-point into alternate space instruction (STFA) copies the 
contents of the 32-bit floating-point register Fg [rd] into memory. 


The store double floating-point into alternate space instruction (STDFA) copies the 
contents of 64-bit floating-point register Fp[rd] into a word-aligned doubleword in 
memory. The unit of atomicity for STDFA is 4 bytes (one word). 


The store quad floating-point into alternate space instruction (STOFA) copies the 
contents of 128-bit floating-point register Fo[rd] into a word-aligned quadword in 
memory. The unit of atomicity for STOFA is 4 bytes (one word). 


Store floating-point into alternate space instructions contain the address space 
identifier (ASI) to be used for the load in the imm asi field if i = 0 or in the ASI 
register if i = 1. The access is privileged if bit 7 of the ASI is 0; otherwise, it is not 
privileged. The effective address for these instructions is "R[rs1] + R[rs2]" if i = 0, or 
“R[rs1] + sign ext (simm13)" if i = 1. 


Programming | Some compilers issued sequences of single-precision stores for 
Note | SPARC V8 processor targets when the compiler could not 
determine whether doubleword or quadword operands were 
properly aligned. For SPARC V9, since emulation of misaligned 
stores is expected to be fast, compilers should issue sets of single- 
precision stores only when they can determine that double- or 
quadword operands are not properly aligned. 


Exceptions. STFA causes a mem adaress not aligned exception if the effective 
memory address is not word-aligned. 
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STDFA requires only word alignment in memory. However, if the effective address is 
word-aligned but not doubleword-aligned, an attempt to execute an STDFA 
instruction causes an STDF_mem_address_not_aligned exception. In this case, trap 
handler software must emulate the STDFA instruction and return (impl. dep. #110- 
V9-Cs10(b)). 


STOFA requires only word alignment in memory. However, if the effective address is 
word-aligned but not quadword-aligned, an attempt to execute an STOFA 
instruction may cause an STQF_mem_address_not_aligned exception. In this case, 
the trap handler software must emulate the STOFA instruction and return (impl. 
dep. #112-V9-Cs10(b)). 





Implementation | STDFA shares an opcode with the STBLOCKF, STPARTIALF, 
Note | and STSHORTF instructions; it is distinguished by the ASI used. 


An attempt to execute an STOFA instruction when rd{1} # 0 causes an 
fp exception other (FSR.ftt = invalid. fp. register) exception. 


Implementation | Since UltraSPARC Architecture 2005 processors do not implement 
Note | in hardware instructions (including STOFA) that refer to quad- 
precision floating-point registers, the 
STQF mem address not aligned and fp exception other (with 
FSR.ftt = invalid. fp. register) exceptions do not occur in 
hardware. However, their effects must be emulated by software 
when the instruction causes an illegal instruction exception and 
subsequent trap. 





In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0, this instruction causes a privileged action exception. In privileged mode 
(PSTATE.priv = 1 and HPSTATE.hpriv = 0), if the ASI is in the range 3046 to 7F16, this 
instruction causes a privileged action exception. 


STFA and STQFA can be used with any of the following ASIs, subject to the privilege 
mode rules described for the privileged action exception above. Use of any other ASI 
with these instructions causes a data access exception exception. 


ASis valid for STFA and STOFA 














ASI NUCLEUS ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

ASI REAL IO ASI REAL IO LITTLE 

ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 
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STFA / STDFA / STQFA 


STDFA can be used with any of the following ASls, subject to the privilege mode 
rules described for the privileged_action exception above. Use of any other ASI with 


the STDFA instruction causes a data_access_exception exception. 





ASls valid for STDFA 
ASI_AS_IF_PRIV_PRIMARY_LITTLE 
ASI AS IF PRIV SECONDARY LITTLE 


ASI AS. 
ASI AS. 


IF PRIV PRIMARY 
IF PRIV SECONDARY 


ASI NUCLEUS ASI NUCLEUS LITTLE 


ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 








ASI REAL IO ASI REAL IO LITTLE 


ASI PRIMARY ASI PRIMARY LITTLE 

















ASI SECONDARY ASI SECONDARY LITTLE 
ASI BLOCK AS IF USER PRIMARY t ASI BLOCK AS IF USER PRIMARY LITTLEt 
ASI BLOCK AS IF USER SECONDARY t ASI BLOCK AS IF USER SECONDARY LITTLEt 








ASI BLOCK PRIMARY t 

ASI BLOCK SECONDARY tł 

ASI BLOCK COMMIT PRIMARYt 
ASI BLOCK COMMIT SECONDARY t 


ASI BLOCK PRIMARY LITTLEtf 
ASI BLOCK SECONDARY LITTLE tft 














ASI FL8 PRIMARY } 
ASI FL8 SECONDARY f 
ASI FL16 PRIMARY f 
ASI FL16 SECONDARY f 


ASI FL8 PRIMARY LITTLE£I 

ASI FL8 SECONDARY LITTLEI 
ASI FL16 PRIMARY LITTLEf 
ASI FL16 SECONDARY LITTLEf 





ASI PST8. PRIMARY * 
ASI PST8. SECONDARY * 
ASI PST16 PRIMARY * 
ASI PST16 SECONDARY * 
ASI PST32 PRIMARY * 
ASI PST32, SECONDARY * 


ASI PST8, PRIMARY, LITTLE * 
ASI PST8, SECONDARY LITTLE * 
ASI PST16 PRIMARY LITTLE * 

ASI PST16 SECONDARY LITTLE * 
ASI PST32 PRIMARY LITTLE * 
ASI PST32 SECONDARY LITTLE * 
































t If this ASI is used with the opcode for STDFA, the STBLOCKF instruction is 
executed instead of STFA. For behavior of STBLOCKF, see Block Store on page 335. 
i If this ASI is used with the opcode for STDFA, the STSHORTF instruction 
is executed instead of STDFA. For behavior of STSHORTF, see 
Store Short Floating-Point on page 350. 
* [f this ASI is used with the opcode for STDFA, the STPARTIALF instruction 
is executed instead of STDFA. For behavior of STPARTIALF, see 
Store Partial Floating-Point on page 347. 


illegal instruction 

fo disabled 

STDF mem address not aligned 

STQF mem adaress not aligned (STOFA only) (not used in UA-2005) 
mem address not aligned 
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fp exception other (FSR.ftt = invalid fp register (STOFA only)) 
privileged action 
VA watchpoint 


See Also Load Floating-Point from Alternate Space on page 254 
Block Store on page 335 
Store Floating-Point on page 339 
Store Short Floating-Point on page 350 
Store Partial Floating-Point on page 347 
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7.95 . Store Floating-Point State Register 








(Lower) 
The STFSR instruction is deprecated and should not be used in new software. 
The STXFSR instruction should be used instead. 
Opcode op3 rd Operation Assembly Language Syntax Class 
STFSRP 100101 0 Store Floating-Point State Register (Lower) st $fsr, [address] D2 


10 0101 1-31 (see page 357) 


D AE e NE a E 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The Store Floating-point State Register (Lower) instruction (STFSR) waits for any 
currently executing FPop instructions to complete, and then it writes the less- 
significant 32 bits of FSR into memory. 


After writing the FSR to memory, STFSR zeroes FSRfit 


V9 Compatibility | FSR.ftt should not be zeroed until it is known that the store will 
Note | not cause a precise trap. 


STFSR accesses memory using the implicit ASI (see page 104). The effective address 
for this instruction is "R[rs1] + R[rs2]" if i = 0, or "R[rs1] + sign ext (simm13)" if 
| - 1. 


An attempt to execute a STFSR instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STFSR instruction causes an 
fp disabled exception. 
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STFSR causes a mem address not aligned exception if the effective memory 
address is not word-aligned. 


V9 Compatibility | Although STFSR is deprecated, UltraSPARC Architecture 

Note implementations continue to support it for compatibility with 
existing SPARC V8 software. The STFSR instruction is defined 
to store only the less-significant 32 bits of the FSR into memory, 
while STXFSR allows SPARC V9 software to store all 64 bits of 
the FSR. 


Implementation | STFSR shares an opcode with the STXFSR instruction (and 

Note | possibly with other implementation-dependent instructions); 
they are differentiated by the instruction rd field. An attempt to 
execute the op = 105, op3 = 10 0101, opcode with an invalid rd 
value causes an illegal instruction exception. 





Exceptions illegal instruction 
fo disabled 
mem address not aligned 
VA watchpoint 


See Also Store Floating-Point on page 339 
Store Floating-Point State Register on page 357 
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7.96 


Store Partial Floating-Point 























ASI 

Instruction Value Operation Assembly Language Syntax + Class 

STPARTIALF C0 Eight 8-bit conditional stores to stda  fregyg, legrgo, [Tegrs1] ASI PST8 P C3 
primary address space 

STPARTIALF Cl Eight 8-bit conditional stores to stda  fregyg, legrgo, [Tegrs1] *ASI PST8 S C3 
secondary address space 

STPARTIALF C8% Eight 8-bit conditional stores to stda fregrq, refs, [Tegrg1] €*ASI PST8 PL C3 
primary address space, little-endian 

STPARTIALF C946 Eight 8-bit conditional stores to stda  fregyg, legrso, [Tegrs1] #ASI_PST8_SL C3 
secondary address space, little- 
endian 

STPARTIALF C2% Four 16-bit conditional stores to — stda  freg,g, regrso, |regys1] #ASI_PST16_P C3 
primary address space 

STPARTIALF C3416 Four 16-bit conditional stores to stda fregrar legrsg, [regrs1] *ASI PST16 S C3 
secondary address space 

STPARTIALF CA Four 16-bit conditional stores to stda fregrar Te8rsar [Tegrs1] #AST_PST16 PL C3 
primary address space, little-endian 

STPARTIALF CB: Four 16-bit conditional stores to stda  frég,g, légrso, [Yegug1] *ASI PST16 SL C3 
secondary address space, little- 
endian 

STPARTIALF C4;¢ Two 32-bit conditional stores to stda freSrqr légyso, [regyg1] #ASI_PST32_P C3 
primary address space 

STPARTIALF C536 Two 32-bit conditional stores to stda fregrar egre, [Tegrs1] #ASI_PST32_S C3 
secondary address space 

STPARTIALF CC} Two 32-bit conditional stores to stda fregrar legrgo, [Tegrs1] #ASI_PST32 PL C3 
primary address space, little-endian 

STPARTIALF CD Two 32-bit conditional stores to stda fregrar reSrso, [Tegrs1] #ASI_PST32 SL C3 


secondary address space, little- 
endian 


+ The original assembly language syntax for a Partial Store instruction (“stda fregyg, 
recated because of inconsistency with the rest of the SPARC assembly language. Over time, assemblers will support the new syntax 
for this instruction. In the meantime, some existing assemblers may only recognize the original syntax. 











[redrsi] regrs2, imm asi") has been dep- 


HOT T SE Tep mmas 


31 30 29 


Description 


25 24 19 18 


14 18 


5 4 


rs2 


The partial store instructions are selected by one of the partial store ASIs with the 


STDFA instruction. 
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STPARTIALF 


Two 32-bit, four 16-bit, or eight 8-bit values from the 64-bit floating-point register 
Fp[rd] are conditionally stored at the address specified by R[rs1], using the mask 
specified in R[rs2]. STPARTIALF has the effect of merging selected data from its 
source register, Fp[rd], into the existing data at the corresponding destination 
locations. 


The mask value in R[rs2] has the same format as the result specified by the pixel 
compare instructions (see SIMD Signed Compare on page 180). The most significant 
bit of the mask (not of the entire register) corresponds to the most significant part of 
Fp[rd]. The data is stored in little-endian form in memory if the ASI name has an “L” 
(or ^ LITTLE") suffix; otherwise, it is stored in big-endian format. 








R[rs2] 
8-bit partial store mask 
forASI PST8 * 76543 210 
mask for bits 63:56 — h 
mask for bits 55:48 
mask for bits 15:8 
mask for bits 7:0 
R[rs2] 


16-bit partial store mask 
for ASI_PST16_* 


mask for bits 63:48 
mask for bits 47:32 
mask for bits 31:16 
mask for bits 15:0 


R[rs2] 
32-bit partial store mask 


for ASI_PST32_* 1 0 
mask for bits 63:32 A A 
mask for bits 31:0 


FIGURE 7-29 Mask Format for Partial Store 








Exceptions. In an UltraSPARC Architecture 2005 implementation, these 
instructions are not implemented in hardware, cause a data access exception 
exception, and are emulated in software. 


An attempt to execute a STPARTIALF instruction when i = 1 causes an 
illegal instruction exception. 
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Exceptions 


STPARTIALF 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STPARTIALF instruction causes an 
fp disabled exception. 


STPARTIALF causes a mem address not aligned exception if the effective memory 
address is not word-aligned. 


STPARTIALF requires only word alignment in memory for eight byte stores. If the 
effective address is word-aligned but not doubleword-aligned, it generates an 
STDF mem adaress not aligned exception. In this case, the trap handler software 
shall emulate the STDFA instruction and return. 


IMPL. DEP. #249-U3-Cs10: For an STPARTIAL instruction, the following aspects of 
data watchpoints are implementation dependent: (a) whether data watchpoint logic 
examines the byte store mask in R[rs2] or it conservatively behaves as if every 
Partial Store always stores all 8 bytes, and (b) whether data watchpoint logic 
examines individual bits in the Virtual (Physical) Data Watchpoint Mask in the LSU 
Control register DCUCR to determine which bytes are being watched or (when the 
Watchpoint Mask is nonzero) it conservatively behaves as if all 8 bytes are being 
watched. 


ASIs C04165-C516 and C816-CD:6 are only used for partial store operations. In 
particular, they should not be used with the LDDFA instruction; however, if any of 
them is used, the resulting behavior is specified in the LDDFA instruction 
description on page 256. 


Implementation | STPARTIALF shares an opcode with the STBLOCKF, STDFA, 
Note | and STSHORTF instructions; it is distinguished by the ASI used. 


illegal instruction 

fp disabled 

data access exception (not implemented in hardware in UA-2005) 
data access MMU error 
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STSHORTF 





7.97 Store Short Floating-Point 





ASI 
Instruction Value Operation Assembly Language Syntax Class 
STSHORTF D0j¢  8-bit store to primary address space stda fregrg, [regaddr] #ASI_FL8_P C3 
stda  fregar [reg_plus_imm] $asi 
STSHORTF D14g  8-bitstore to secondary address space  stda  freg,g, [regaddr] $ASI FL8 S C3 


stda  fregar |reg plus imm] sasi 
STSHORTF D846  8-bit store to primary address space, stda  fregyg, lregaddr] &ASI FL8 PL C3 
little-endian stda  fregra (reg plus imm] asi 
STSHORTF D9%:6  8-bitstore to secondary address space, stda  freg,g, [regaddr] $ASI FL8 SL C3 
little-endian stda fregra (reg plus imm] %asi 
STSHORTF D216 16-bit store to primary address space stda  freg;gr [regaddr] #ASI_FL16_P C3 
stda  fregrar lreg plus imm] %asi 
STSHORTF D316 16-bit store to secondary address space stda fregrar [regaddr) $ASI FL16 S C3 
stda  fregrar lreg plus imm] %asi 
STSHORTF  DAx% 16-bit store to primary address space, — stda  freg,g, [regaddr| #ASI_FL16_PL C3 
little-endian tda  /freg,g, [reg plus imm] %asi 


STSHORTF DByg 16-bit store to secondary address space, 
little-endian 


o 

















tda  fregrar [regaddr] $&ASI FL16 SL C3 
tda fregrar |reg plus imm] $asi 


o 











o 


WHO E ww I9 
"em 


31 30 29 25 24 19 18 14 13 5 4 0 


Description The short floating-point store instruction allows 8- and 16-bit stores to be performed 
from the floating-point registers. Short stores access the low-order 8 or 16 bits of the 
register. 


Little-endian ASIs transfer data in little-endian format from memory; otherwise, 
memory is assumed to be big-endian. Short stores are typically used with the 
FALIGNDATA instruction (see Align Data on page 175) to assemble or store 64 bits 
on noncontiguous components. 


Implementation | STSHORTF shares an opcode with the STBLOCKF, STDFA, and 
Note | STPARTIALF instructions; it is distinguished by the ASI used. 


In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an data access exception exception, and are 
emulated in software. 
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STSHORTF 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STSHORTF instruction causes an 
fp disabled exception. 


STSHORTF causes a mem address not aligned exception if the effective memory 
address is not halfword-aligned. 


An 8-bit STSHORTF (using ASI D016, D116, D816, or D946) can be performed to an 
arbitrary memory address (no alignment requirement). 


A 16-bit STSHORTF (using ASI D216, D316, DA16, or DBy¢) to an address that is not 
halfword-aligned (an odd address) causes a mem adaress not aligned exception. 


Exceptions VA watchpoint 
data access exception 
data access MMU error 
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STTW (Deprecated) 





7.98 | Store Integer Twin Word 


The STTW instruction is deprecated and should not be used in new software. 
The STX instruction should be used instead. 


Opcode op3 Operation Assembly Language Syntax + Class 








STTWP 000111 Store Integer Twin Word sttw reg rq, [address] D2 


+ The original assembly language syntax for this instruction used an “std” instruction mnemonic, which is now 
deprecated. Over time, assemblers will support the new "sttw" mnemonic for this instruction. In the meantime, 
some existing assemblers may only recognize the original “std” mnemonic. 


GIG I €9—I MH ———I1—- 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The store integer twin word instruction (STTW) copies two words from an R register 
pair into memory. The least significant 32 bits of the even-numbered R register are 
written into memory at the effective address, and the least significant 32 bits of the 
following odd-numbered R register are written into memory at the "effective 
address + 4". 


The least significant bit of the rd field of a store twin word instruction is unused and 
should always be set to 0 by software. 


STTW accesses memory using the implicit ASI (see page 104). The effective address 
for this instruction is "R[rs1] + R[rs2]" if i = 0, or "R[rs1] + sign. ext (simm13)" if 
| - 1. 


A successful store twin word instruction operates atomically. 
IMPL. DEP. #108-V9a: It is implementation dependent whether STTW is 
implemented in hardware. If not, an attempt to execute it will cause an 


unimplemented STTW exception. (STTW is implemented in hardware in all 
UItraSPARC Architecture 2005 implementations.) 


An attempt to execute an STTW instruction when either of the following conditions 
exist causes an illegal instruction exception: 


m destination register number rd is an odd number (is misaligned) 
m i=0 and instruction bits 12:5 are nonzero 
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Exceptions 


See Also 


STTW (Deprecated) 


STTW causes a mem adaress not aligned exception if the effective address is not 
doubleword-aligned. 


With respect to little-endian memory, an STTW instruction behaves as if it is 
composed of two 32-bit stores, each of which is byte-swapped independently before 
being written into its respective destination memory word. 


Programming | STTW is provided for compatibility with SPARC V8. It may 

Notes | execute slowly on SPARC V9 machines because of data path and 
register-access difficulties. Therefore, software should avoid 
using STTW. 


If STTW is emulated in software, STX instruction should be 
used for the memory access in the emulation code to preserve 
atomicity. 





unimplemented STTW 
illegal instruction 

mem address not aligned 
VA watchpoint 

fast data access MMU miss 
data access MMU miss 
data access MMU error 
fast data access protection 





STW/STX on page 331 
STTWA on page 354 
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STTWA (Deprecated) 





7.99 Store Integer Twin Word into Alternate 
opace 


The STTWA instruction is deprecated and should not be used in new software. 
The STXA instruction should be used instead. 








Opcode op3 Operation Assembly Language Syntax Class 





STTWAP FAST 010111 Store Twin Word into Alternate Space — sttwa regyg[regaddr] imm asi D2, Y3t 
sttwa rega [reg plus imm] $asi 





t The original assembly language syntax for this instruction used an "st da" instruction mnemonic, which is now deprecated. Over 
time, assemblers will support the new "sttwa" mnemonic for this instruction. In the meantime, some existing assemblers may only 
recognize the original “st da” mnemonic. 





t Y3 for restricted ASIs (00:6-7F16); D2 for unrestricted ASIs (8016-FF16) 


ER IX 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The store twin word integer into alternate space instruction (STTWA) copies two 
words from an R register pair into memory. The least significant 32 bits of the even- 
numbered R register are written into memory at the effective address, and the least 
significant 32 bits of the following odd-numbered R register are written into memory 
at the "effective address + 4". 


The least significant bit of the rd field of an STTWA instruction is unused and should 
always be set to 0 by software. 


Store integer twin word to alternate space instructions contain the address space 
identifier (ASI) to be used for the store in the imm asi field if i = 0, or in the ASI 
register if i = 1. The access is privileged if bit 7 of the ASI is 0; otherwise, it is not 
privileged. The effective address for these instructions is "R[rs1] + R[rs2]" if i = 0, or 
^R[rs1]-* sign. ext (simm13)" if i = 1. 


A successful store twin word instruction operates atomically. 


With respect to little-endian memory, an STTWA instruction behaves as if it is 
composed of two 32-bit stores, each of which is byte-swapped independently before 
being written into its respective destination memory word. 
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Exceptions 


STTWA (Deprecated) 


IMPL. DEP. #108-V9b: It is implementation dependent whether STTWA is 
implemented in hardware. If not, an attempt to execute it will cause an 
unimplemented_STTW exception. (STTWA is implemented in hardware in all 
UltraSPARC Architecture 2005 implementations.) 


An attempt to execute an STTWA instruction with a misaligned (odd) destination 
register number rd causes an illegal_instruction exception. 


STTWA causes a mem_address_not_aligned exception if the effective address is not 
doubleword-aligned. 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0, this instruction causes a privileged_action exception. In privileged mode 
(PSTATE.priv = 1 and HPSTATE.hpriv = 0), if the ASI is in the range 3046 to 7Fy¢, this 
instruction causes a privileged_action exception. 


STTWA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged_action exception above. Use of any other ASI with 
this instruction causes a data_access_exception exception (impl. dep. #300-U4- 


Cs10). 





ASI 
ASI 
ASI 
ASI. 


AS 


AS 
AS 








Programming 


Note 


Programming 


Note 





ASls valid for STTWA 








. NUCLEUS ASI NUCLEUS LITTLE 

.AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 

.AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
REAL ASI REAL LITTLE 

[ REAL IO ASI REAL IO LITTLE 

[ PRIMARY ASI PRIMARY LITTLE 

[ SECONDARY ASI SECONDARY LITTLE 


Nontranslating ASIs (see page 421) may only be accessed using 
STXA (not STTWA) instructions. If an STTWA referencing a 
nontranslating ASI is executed, per the above table, it generates 
a data access exception exception (impl. dep. #300-U4-Cs10). 


STTWA is provided for compatibility with existing SPARC V8 
software. It may execute slowly on SPARC V9 machines because 
of data path and register-access difficulties. Therefore, software 
should avoid using STTWA. 


If STTWA is emulated in software, the STXA instruction should 
be used for the memory access in the emulation code to preserve 
atomicity. 


unimplemented STTW 
illegal instruction 
mem address not aligned 
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STTWA (Deprecated) 


privileged action 

VA watchpoint 

fast data access MMU miss 
data access MMU miss 
data access MMU error 
fast data access protection 





See Also STWA/STXA on page 332 
STTW on page 352 
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STXFSR 





7.100 Store Floating-Point State Register 


Instruction  op3 rd Operation Assembly Language Class 
10 0101 0 (see page 345) 
STXFSR 10 0101 1 Store Floating-Point State register stx $fsr, [address] A1 


— 10 0101 2-31 Reserved 





RTE EEE 
Ux 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The store floating-point state register instruction (STXFSR) waits for any currently 
executing FPop instructions to complete, and then it writes all 64 bits of the FSR into 
memory. 


STXFSR zeroes FSR.ftt after writing the FSR to memory. 


Implementation | FSR.ftt should not be zeroed by STXFSR until it is known that the 
Note | store will not cause a precise trap. 


STXFSR accesses memory using the implicit ASI (see page 104). The effective 
address for this instruction is "R[rs1] + R[rs2]" if i = 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


Exceptions. An attempt to execute a STXFSR instruction when i = 0 and instruction 
bits 12:5 are nonzero causes an illegal instruction exception. 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STXFSR instruction causes an 
fp disabled exception. 


If the effective address is not doubleword-aligned, an attempt to execute an 
STXFSRinstruction causes a mem adaress not aligned exception. 


Implementation | STXFSR shares an opcode with the (deprecated) STFSR 
Note | instruction (and possibly with other implementation-dependent 
instructions); they are differentiated by the instruction rd field. 
An attempt to execute the op = 105, op3 = 10 0101, opcode with 
an invalid rd value causes an illegal instruction exception. 
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STXFSR 


Exceptions illegal instruction 
fo disabled 
mem address not aligned 
VA watchpoint 


See Also Load Floating-Point State Register on page 273 
Store Floating-Point on page 339 
Store Floating-Point State Register (Lower) on page 345 
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SUB 





7.101  Subtract 


Instruction op3 Operation Assembly Language Syntax Class 
SUB 000100 Subtract sub Tégrg1, leg or imm, Tegyg A1 
SUBcc 010100  Subtract and modify cc's subcc ergy, leg or imm, Tegra A1 
SUBC 00 1100 Subtract with Carry subc VESrstr leg or imm, TeSrd A1 
SUBCcc 011100 Subtract with Carry and modify cc's subccc regn, leg or imm, Tegra A1 


TETE 
PT DT me —] 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description These instructions compute “R[rs1]- R[rs2]" if i = 0, or 
^R[rs1] - sign ext (simm13)" if i = 1, and write the difference into R[rd]. 


SUBC and SUBCcc ("SUBtract with carry") also subtract the CCR register's 32-bit 
carry (icc.c) bit; that is, they compute "R[rs1] - R[rs2] - icc.c" or 
“R[rs1] - sign ext (simm13) - icc.c" and write the difference into R[rd]. 


SUBcc and SUBCcc modify the integer condition codes (CCR.icc and CCR.xcc). A 32- 
bit overflow (CCR.icc.v) occurs on subtraction if bit 31 (the sign) of the operands 
differs and bit 31 (the sign) of the difference differs from R[rs1](31]. A 64-bit 
overflow (CCR.xcc.v) occurs on subtraction if bit 63 (the sign) of the operands differs 
and bit 63 (the sign) of the difference differs from R[rs1]{63}. 
Programming | A SUBcc instruction with rd = 0 can be used to effect a signed or 
Notes | unsigned integer comparison. See the cmp synthetic instruction in 
Appendix C, Assembly Language Syntax. 
SUBC and SUBCcc read the 32-bit condition codes’ carry bit 
(CCR.icc.c), not the 64-bit condition codes’ carry bit (CCR.xcc.c). 


An attempt to execute a SUB instruction when i = 0 and instruction bits 12:5 are 


nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 
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SWAP (Deprecated) 





7.102 


Swap Register with Memory 


The SWAP instruction is deprecated and should not be used in new software. 
The CASA or CASXA instruction should be used instead. 
Opcode op3 Operation Assembly Language Syntax Class 
SWAP” 001111 Swap Register with Memory swap [address], regra D2 











GIG Ts GM — E 


Description 


Exceptions 


25 24 19 18 14 13 12 5 4 0 


SWAP exchanges the less significant 32 bits of R[rd] with the contents of the word at 
the addressed memory location. The upper 32 bits of R[rd] are set to 0. The operation 
is performed atomically, that is, without allowing intervening interrupts or deferred 
traps. In a multiprocessor system, two or more virtual processors executing CASA, 
CASXA, SWAP, SWAPA, LDSTUB, or LDSTUBA instructions addressing any or all of 
the same doubleword simultaneously are guaranteed to execute them in an 
undefined, but serial, order. 


SWAP accesses memory using the implicit ASI (see page 104). The effective address 
for these instructions is “R[rs1] + R[rs2]" if i = 0, or "R[rs1] + sign ext (simm13)" if 
| - 1. 

An attempt to execute a SWAP instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If the effective address is not word-aligned, an attempt to execute a SWAP instruction 
causes a mem address not aligned exception. 


The coherence and atomicity of memory operations between virtual processors and 
I/O DMA memory accesses are implementation dependent (impl. dep. #120-V9). 


illegal instruction 

mem address not aligned 
VA watchpoint 

fast data access MMU miss 
data access MMU miss 
data access MMU error 
fast data access protection 
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SWAPA (Deprecated) 





7.103 Swap Register with Alternate Space 
Memory 


The SWAPA instruction is deprecated and should not be used in new software. 
The CASXA instruction should be used instead. 








Opcode op3 Operation Assembly Language Syntax Class 
SWAPA-^' ^9 01111 Swap register with Alternate Space swapa  [regaddr] imm asi, reg, D2, Y3t 
Memory swapa [reg plus imm] $asi, regra 





t Y3 for restricted ASIs (0016-7F16); D2 for unrestricted ASIs (8016-FF16) 


AT 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description SWAPA exchanges the less significant 32 bits of R[rd] with the contents of the word 
at the addressed memory location. The upper 32 bits of R[rd] are set to 0. The 
operation is performed atomically, that is, without allowing intervening interrupts 
or deferred traps. In a multiprocessor system, two or more virtual processors 
executing CASA, CASXA, SWAP, SWAPA, LDSTUB, or LDSTUBA instructions 
addressing any or all of the same doubleword simultaneously are guaranteed to 
execute them in an undefined, but serial, order. 


The SWAPA instruction contains the address space identifier (ASI) to be used for the 
load in the imm asi field if i = 0, or in the ASI register if i = 1. The access is 
privileged if bit 7 of the ASI is 0; otherwise, it is not privileged. The effective address 
for this instruction is "R[rs1]  R[rs2]" if i = 0, or "R[rs1] + sign ext (simm13)" if 
i= 1. 


This instruction causes a mem address not aligned exception if the effective 
address is not word-aligned. It causes a privileged action exception if 
PSTATE.priv = 0 and bit 7 of the ASI is 0. 


The coherence and atomicity of memory operations between virtual processors and 
I/O DMA memory accesses are implementation dependent (impl. dep #120-V9). 


If the effective address is not word-aligned, an attempt to execute a SWAPA 
instruction causes a mem_address_not_aligned exception. 
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Exceptions 
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SWAPA (Deprecated) 


In nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), if bit 7 of the ASI 
is 0, this instruction causes a privileged action exception. In privileged mode 

(PSTATE.priv = 1 and HPSTATE.hpriv = 0), if the ASI is in the range 3046 to 7F16, this 
instruction causes a privileged action exception. 


SWAPA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged action exception above. Use of any other ASI with 
this instruction causes a data access exception exception. 





ASI NUCLEUS 

ASI AS IF USER PRIMARY 
ASI AS IF USER SECONDARY 
ASI PRIMARY 

ASI SECONDARY 

ASI REAL 








mem address not aligned 
privileged action 

VA watchpoint 

data access exception 

fast data access MMU miss 
data access MMU miss 
data access MMU error 
fast data access protection 





ASls valid for SWAPA 


ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY LITTLE 
ASI PRIMARY LITTLE 

ASI SECONDARY LITTLE 








ASI REAL LITTLE 





TADDcc 





7.104 


Tagged Add 


Instruction op3 Operation Assembly Language Syntax Class 


TADDcc 


100000 Tagged Add and modify cc's taddcc regygy, reg or imm, Tegrq Al 





DC w [ - [s HR - T 


31 30 29 


Description 


Exceptions 


See Also 


25 24 19 18 14 18 12 5 4 0 


This instruction computes a sum that is "R[rs1] + R[rs2]" if i = 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


TADD«cc modifies the integer condition codes (icc and xcc). 


A tag overflow condition occurs if bit 1 or bit 0 of either operand is nonzero or if the 
addition generates 32-bit arithmetic overflow (that is, both operands have the same 
value in bit 31 and bit 31 of the sum is different). 


If a TADDcc causes a tag overflow, the 32-bit overflow bit (CCR.icc.v) is set to 1; if 
TADD«cc does not cause a tag overflow, CCR.icc.v is set to 0. 


In either case, the remaining integer condition codes (both the other CCR.icc bits and 
all the CCR.xcc bits) are also updated as they would be for a normal ADD 
instruction. In particular, the setting of the CCR.xcc.v bit is not determined by the 
tag overflow condition (tag overflow is used only to set the 32-bit overflow bit). 
CCR.xcc.v is set based on the 64-bit arithmetic overflow condition, like a normal 64- 
bit add. 


An attempt to execute a TADDcc instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


illegal instruction 


TADDccTVP on page 364 
TSUBcc on page 369 
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TADDccTV (Deprecated) 


7.105 Tagged Add and Trap on Overflow 


The TADDccTV instruction is deprecated and should not be used in new 
software. The TADDcc instruction followed by the BPVS instruction (with 


instructions to save the pre- TADDcc integer condition codes if necessary) should 
be used instead. 








Opcode 


Operation Assembly Language Syntax Class 





TADDccTV 


100010 Tagged Add and taddcctv  regysi, leg or imm, regyg D2 


modify cc's or Trap on Overflow 





mI Io Tee e T 


31 30 29 


Description 


25 24 19 18 14 13 12 5 4 0 


This instruction computes a sum that is "R[rs1] + R[rs2]" if i = 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


TADDccTV modifies the integer condition codes if it does not trap. 


An attempt to execute a TADDccTV instruction when i = 0 and instruction bits 12:5 
are nonzero causes an /llegal instruction exception. 


A tag overflow condition occurs if bit 1 or bit 0 of either operand is nonzero or if the 
addition generates 32-bit arithmetic overflow (that is, both operands have the same 
value in bit 31 and bit 31 of the sum is different). 


If TADDccTV causes a tag overflow, a lag overflow exception is generated and R[rd] 
and the integer condition codes remain unchanged. If a TADDccTV does not cause a 
tag overflow, the sum is written into R[rd] and the integer condition codes are 
updated. CCR.icc.v is set to 0 to indicate no 32-bit overflow. 


In either case, the remaining integer condition codes (both the other CCR.icc bits and 
all the CCR.xcc bits) are also updated as they would be for a normal ADD 
instruction. In particular, the setting of the CCR.xcc.v bit is not determined by the 
tag overflow condition (tag overflow is used only to set the 32-bit overflow bit). 
CCR.xcc.v is set only on the basis of the normal 64-bit arithmetic overflow condition, 
like a normal 64-bit add. 


364 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


TADDccTV (Deprecated) 


SPARC V8 | TADDccTV traps based on the 32-bit overflow condition, just as 
Compatibility | in the SPARC V8 architecture. Although the tagged add 
Note | instructions set the 64-bit condition codes CCR.xcc, there is no 
form of the instruction that traps on the 64-bit overflow 


condition. 
Exceptions illegal instruction 
tag overflow 
See Also TADDcc on page 363 


TSUBccTVP on page 370 
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Tcc 





7.106 ‘Trap on Integer Condition Codes (Icc) 








Instruction op3 cond Operation cc Test Assembly Language Syntax Class 

TA 111010 1000 Trap Always 1 ta i or x cc, software trap number A1 

TN 111010 0000 ‘Trap Never 0 tn i or x cc, software trap number A1 

TNE 111010 1001 Trap on Not Equal not Z tnet i_or_x_cc, software_trap_number A1 

TE 111010 0001 Trap on Equal Z tet i or x cc, software trap number A1 

TG 111010 1010 ‘Trap on Greater not(Zor(N tg i or x cc, software trap number A1 
xor V)) 


TLE 111010 0010 ‘Trap on Less or Equal Z or (N xor V) tle i or x cc, software trap number A1 
TGE 111010 1011 Trap on Greater or not (N xor V) tge i or x cc, software trap number A1 


Equal 

TL 111010 0011 Trap on Less N xor V tl i or x cc, software trap number A1 

TGU 111010 1100 Trap on Greater, not(CorZ) tgu i or x cc, software trap number A1 
Unsigned 

TLEU 111010 0100 Trap on Less or (C or Z) tleu i or x cc, software trap number A1 
Equal, Unsigned 

TCC 111010 1101 Trap on Carry Clear not C tcc? i or x cc, software trap number A1 


(Greater than or 
Equal, Unsigned) 








TCS 111010 0101 ‘Trap on Carry Set C tcs" i or x cc, software trap number A1 
(Less Than, Unsigned) 

TPOS 111010 1110 Trap on Positive or not N tpos i or x cc, software trap number A1 
Zero 

TNEG 111010 0110 Trap on Negative N tneg i or x cc, software trap number A1 

TVC 111010 1111 Trap on Overflow not V tvc i or x cc, software trap number A1 
Clear 

TVS 111010 0111 Trap on Overflow Set V tvs i or x cc, software trap number A1 

t synonym: tnz t synonym: tz ? synonym: tgeu Y synonym: tlu 


mLEDeTI-—1——— E 
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Description 


Tcc 








cc1 :: ccO Condition Codes Evaluated 
00 CCR.icc 
01 — (illegal instruction) 


10 CCR.xcc 


11 — (illegal instruction) 





The Tcc instruction evaluates the selected integer condition codes (icc or xcc) 
according to the cond field of the instruction, producing either a TRUE or FALSE 
result. If TRUE and no higher-priority exceptions or interrupt requests are pending, 
then a trap instruction or htrap instruction exception is generated. If FALSE, the 
trap instruction (or htrap instruction) exception does not occur and the instruction 
behaves like a NOP. 




















For brevity, in the remainder of this section the value of the "software trap number" 
used by Tcc will be referred to as “SWTN”. 


In nonprivileged mode, if i = 0 the SWTN is specified by the least significant seven 
bits of "R[rs1] + R[rs2]". If i = 1, the SWTN is provided by the least significant seven 
bits of "R[rs1] + imm trap £f". Therefore, the valid range of values for SWTN in 
nonprivileged mode is 0 to 127. The most significant 57 bits of SWTN are unused 
and should be supplied as zeroes by software. 


In privileged and hyperprivileged modes, if i = 0 the SWTN is specified by the least 
significant eight bits of "R[rs1] + R[rs2]". If i = 1, the SWTN is provided by the least 
significant eight bits of "R[rs1] + imm trap t". Therefore, the valid range of values 
for SWTN in privileged and hyperprivileged modes is 0 to 255. The most significant 
56 bits of SWTN are unused an should be supplied as zeroes by software. 


Generally, values of 0 < SWTN x 127 are used to trap to privileged-mode software 
and values of 128 x SWTN «x 255 are used to trap to hyperprivileged-mode software. 
The behavior of Tcc, based on the privilege mode in effect when it is executed and 
the value of the supplied SWTN, is as follows: 
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Tcc 


Behavior of Tcc instruction 


Privilege Mode in effect when Tcc is executed 0 x SWTN < 127 128 < SWTN < 255 





Nonprivileged trap instruction exception — — 
(PSTATE.priv = 0 and HSTATE.hpriv 20) (to privileged mode) (not possible, because 
(256 < TT < 383) SWTN is a 7-bit value in 
nonprivileged mode) 
Privileged trap instruction exception — htrap instruction exception 
(PSTATE.priv = 1 and HSTATE.hpriv 20) (to privileged mode) (to hyperprivileged mode) 
(256 « TT < 383) (384 < TT < 511) 
Hyperprivileged htrap instruction exception  htrap instruction exception 
(and HSTATE.hpriv = 1) (to hyperprivileged mode) (to hyperprivileged mode) 
(256 < TT < 383) (384 € TT < 511) 


Programming | Tcc can be used to implement breakpointing, tracing, and calls to 
Note | privileged and hyperprivileged software. It can also be used for 
runtime checks, such as for out-of-range array indexes and integer 
overflow. 


Exceptions. An attempt to execute a Tcc instruction when any of the following 
conditions exist causes an /llegal instruction exception: 


m instruction bit 29 is nonzero 


m i=0 and instruction bits 12:5 are nonzero 
m i=1 and instruction bits 10:8 are nonzero 
m ccO=1 


If a Tec instruction causes a trap instruction or htrap instruction trap, 256 plus the 
SWTN value is written into TT[TL]. Then the trap is taken and the virtual processor 
performs the normal trap entry procedure, as described in Trap Processing on page 
486. 


Exceptions illegal instruction 
trap instruction (0 < SWTN < 127) 
htrap instruction (128 < SWTN < 255) 
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TSUBcc 





7.107 ‘Tagged Subtract 


Instruction 


TSUBcc 


op3 


Operation Assembly Language Syntax Class 


100001 ‘Tagged Subtract and modify cc's — tsubcc regrsir reg or imm, reg;g A1 





mI WII a 
d 


31 30 29 


Description 


Exceptions 


See Also 


25 24 19 18 14 13 12 5 4 0 


This instruction computes “R[rs1] — R[rs2]" if i = 0, or 
^R[rs1] - sign ext (simm13)" if i = 1. 


TSUBcc modifies the integer condition codes (icc and xcc). 


A tag overflow condition occurs if bit 1 or bit 0 of either operand is nonzero or if the 
subtraction generates 32-bit arithmetic overflow; that is, the operands have different 
values in bit 31 (the 32-bit sign bit) and the sign of the 32-bit difference in bit 31 
differs from bit 31 of R[rs1]. 


If a TSUBcc causes a tag overflow, the 32-bit overflow bit (CCR.icc.v) is set to 1; if 
TSUBcc does not cause a tag overflow, CCR.icc.v is set to 0. 


In either case, the remaining integer condition codes (both the other CCR.icc bits and 
all the CCR.xcc bits) are also updated as they would be for a normal subtract 
instruction. In particular, the setting of the CCR.xcc.v bit is not determined by the 
tag overflow condition (tag overflow is used only to set the 32-bit overflow bit). 
ccr.xcc.v is set based on the 64-bit arithmetic overflow condition, like a normal 64-bit 
subtract. 


An attempt to execute a TSUBcc instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


illegal instruction 


TADDcc on page 363 
TSUBccTVP on page 370 
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TSUBccTV (Deprecated) 





7.108 Tagged Subtract and Trap on Overflow 


The TSUBccTV instruction is deprecated and should not be used in new 
software. The TSUBcc instruction followed by BPVS instead (with instructions to 


save the pre-TSUBcc integer condition codes if necessary) should be used 
instead. 








Opcode op3 Operation Assembly Language Syntax Class 





TSUBccTV 100011 Tagged Subtract and tsubcctv Tegn, l'eg or imm, regrg D2 
modify cc's or Trap on Overflow 





mI [ow I a e 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description This instruction computes “R[rs1] — R[rs2]" if i = 0, or "R[rs1] — sign ext (simm13)" 
ifi-1. 


TSUBccTV modifies the integer condition codes (icc and xcc) if it does not trap. 


A tag overflow condition occurs if bit 1 or bit 0 of either operand is nonzero or if the 
subtraction generates 32-bit arithmetic overflow; that is, the operands have different 
values in bit 31 (the 32-bit sign bit) and the sign of the 32-bit difference in bit 31 
differs from bit 31 of R[rs1]. 


An attempt to execute a TSUBccTV instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If TSUBccTV causes a tag overflow, then a tag overflow exception is generated and 
R[rd] and the integer condition codes remain unchanged. If a TSUBccTV does not 
cause a tag overflow condition, the difference is written into R[rd] and the integer 
condition codes are updated. CCR.icc.v is set to 0 to indicate no 32-bit overflow. 


In either case, the remaining integer condition codes (both the other CCR.icc bits and 
all the CCR.xcc bits) are also updated as they would be for a normal subtract 
instruction. In particular, the setting of the CCR.xcc.v bit is not determined by the 
tag overflow condition (tag overflow is used only to set the 32-bit overflow bit). 
CCR.xcc.v is set only on the basis of the normal 64-bit arithmetic overflow condition, 
like a normal 64-bit subtract. 
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TSUBccTV (Deprecated) 


SPARC V8 | TSUBccTV traps based on the 32-bit overflow condition, just as 
Compatibility | in the SPARC V8 architecture. Although the tagged add 
Note | instructions set the 64-bit condition codes CCR.xcc, there is no 
form of the instruction that traps on the 64-bit overflow 


condition. 
Exceptions illegal instruction 
tag overflow 
See Also TADDccTVP on page 364 


TSUBcc on page 369 
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UDIV, UDIVcc (Deprecated) 





7.109 


Unsigned Divide (64-bit + 32-bit) 


The UDIV and UDIVcc instructions are deprecated and should not be used in 
new software. The UDIVX instruction should be used instead. 





Opcode op3 Operation Assembly Language Syntax Class 
UDIVP 001110 Unsigned Integer Divide udiv TEg rg1, l'eg. 0r. imr, leg rq D2 
UDIVccP? 011110 Unsigned Integer Divide and modify cc's udivcc  regy;,reg or imm, regra D2 





m SH Ix 


31 30 29 25 24 19 18 14 13 12 5 4 0 

Description The unsigned divide instructions perform 64-bit by 32-bit division, producing a 32- 
bit result. If i = 0, they compute "(Y :: R[rs1](31:0]) + R[rs2](31:0]". Otherwise (that is, 
if i = 1), the divide instructions compute "(Y :: R[rs1](31:0]) + 
(sign ext(simm13)(31:0])". In either case, if overflow does not occur, the less 
significant 32 bits of the integer quotient are sign- or zero-extended to 64 bits and are 
written into R[rd]. 
The contents of the Y register are undefined after any 64-bit by 32-bit integer divide 
operation. 

Unsigned Divide 


Unsigned divide (UDIV, UDIVcc) assumes an unsigned integer doubleword 
dividend (Y :: R[rs1]{31:0}) and an unsigned integer word divisor R[rs2(31:0]] or 
(sign ext (simm13)(31:0]) and computes an unsigned integer word quotient (R[rd]). 
Immediate values in simm13 are in the ranges 0 to 212-1 and 22 - 2? to 22-1 for 
unsigned divide instructions. 


Unsigned division rounds an inexact rational quotient toward zero. 


Programming | The rational quotient is the infinitely precise result quotient. It 
Note | includes both the integer part and the fractional part of the 
result. For example, the rational quotient of 11/4 = 2.75 (integer 
part = 2, fractional part = .75). 
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Exceptions 


See Also 


UDIV, UDIVcc (Deprecated) 


The result of an unsigned divide instruction can overflow the less significant 32 bits 
of the destination register R[rd] under certain conditions. When overflow occurs, the 
largest appropriate unsigned integer is returned as the quotient in R[rd]. The 
condition under which overflow occurs and the value returned in R[rd] under this 
condition are specified in TABLE 7-15. 


TABLE 7-15 UDIV / UDIVcc Overflow Detection and Value Returned 





Condition Under Which Overflow Occurs Value Returned in R[rd] 
Rational quotient > 2° 232 4 
(0000 0000 FFFF FFFF 6) 





When no overflow occurs, the 32-bit result is zero-extended to 64 bits and written 
into register R[rd]. 


UDIV does not affect the condition code bits. UDIVcc writes the integer condition 
code bits as shown in the following table. Note that negative (N) and zero (Z) are set 
according to the value of R[rd] after it has been set to reflect overflow, if any. 








Bit Effect on bit of UDIVcc instruction 
icc.n Set if R[rd]{31} = 1 

icc.z Set if R[rd]{31:0} = 0 

icc.v Set if overflow (per TABLE 7-15) 
icc.c Zero 

xcc.n Set if R[rd]{63} = 1 

XCC.Z Set if R[rd]{63:0} = 0 

XCC.V Zero 

XCC.C Zero 


An attempt to execute a UDIV or UDIVcc instruction when i = 0 and instruction bits 
12:5 are nonzero causes an illegal instruction exception. 


illegal instruction 
division by zero 


RDY on page 303 


SDIV [cc] on page 321, 
UMUL([cc] on page 374 
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UMUL, UMULcc (Deprecated) 





7.110 Unsigned Multiply (32-bit) 


The UMUL and UMULcc instructions are deprecated and should not be used in 
new software. The MULX instruction should be used instead. 





Opcode op3 Operation Assembly Language Syntax Class 
UMULP 001010 Unsigned Integer Multiply umul Tegrg1, leg Or imm, Tegra D2 
UMULccP 011010 Unsigned Integer Multiply and modify cc's umulcc regysy, reg or imm, regrg D2 





BI = o Tee - | = 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The unsigned multiply instructions perform 32-bit by 32-bit multiplications, 
producing 64-bit results. They compute “R[rs1]{31:0} x R[rs2](31:0]" if i = 0, or 
^R[rs1](31:0] x sign. ext (simm13)(31:0]" if i = 1. They write the 32 most significant 
bits of the product into the Y register and all 64 bits of the product into R[rd]. 


Unsigned multiply instructions (UMUL, UMULcc) operate on unsigned integer 
word operands and compute an unsigned integer doubleword product. 


UMUL does not affect the condition code bits. UMULcc writes the integer condition 
code bits, icc and xcc, as shown below. 


Bit Effect on bit by execution of UMULcc 

icc.n Set to 1 if product{31} = 1; otherwise, set to 0 
icc.z Set to 1 if product{31:0}= 0; otherwise, set to 0 
icc.v Set to 0 

icc.c Set to 0 

Xcc.n Set to 1 if product{63} = 1; otherwise, set to 0 
XCC.Z Set to 1 if product{63:0} = 0; otherwise, set to 0 
XCC.V Set to 0 

XCC.C Set to 0 





Note | 32-bit negative (icc.n) and zero (icc.z) condition codes are set 
according to the less significant word of the product, not 
according to the full 64-bit result. 


374 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


UMUL, UMULcc (Deprecated) 


Programming | 32-bit overflow after UMUL or UMULcc is indicated by Y + 0. 
Notes 


An attempt to execute a UMUL or UMULcc instruction when i = 0 and instruction 
bits 12:5 are nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 


See Also MULScc on page 285 
RDY on page 303 
SMUL[cc] on page 329, 
UDIV[cc] on page 372 
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WRasr 


7.111 Write Ancillary State Register 


Instruction 


WRYP 


WRCCR 


WRASI 


WRFPRS 


WRPCR? 


WRPICPrIC 


WRGSR 
WRSOFTINT SET? 


WRSOFTINT. CLRP 
WRSOFTINT? 


WRTICK CMPRP 
WRSTICK# 
WRSTICK_CMPR? 


rd 
0 
1 
2 


oO oO BR Q 


7-14 


15 


16 


17 


18 


19 
20 


21 


22 


23 

24 

25 
26 


27 


28 


Operation 
Write Y register (deprecated) 
Reserved 


Write Condition Codes 
register 


Write ASI register 
Reserved (read-only ASR (TICK)) 
Reserved (read-only ASR (PC)) 


Write Floating-Point Registers Status 
register 


Reserved 


Software-initiated reset (see Software- 
Initiated Reset on page 326) 


Write Performance Control register 


(PCR) 


Write Performance Instrumentation 
Counters (PIC) 


Reserved (impl. dep. #8-V8-Cs20, #9- 
V8-Cs20) 


Write General Status register (GSR) 


Set bits of per-virtual processor Soft 
Interrupt register 


wr 


wr 


wr 


wr 


wr 


wr 


wr 


Clear bits of per-virtual processor Soft wr 


Interrupt register 


Write per-virtual processor Soft 
Interrupt register 


Write Tick Compare register 
Write System Tick register 
Write System Tick Compare register 


Reserved 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 
Reserved 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


Implementation dependent 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


wr 


wr 


wr 


wr 


Assembly Language Syntax 


TES rg, l'eg or imm, Zy 


VS rg, l'eg or imm, $ccr 


Tegyg1, l'eg or imm, Sasi 


VESrstr leg or imm, $fprs 


Tegyg1, l'eg or imm, $pcr 


Tegrg1, l'eg or imm, špic 


VS rst, l'eg or imm, $gsr 


legrg1, leg or imm, $softint set 
l'egrg1, eg or imm, $softint clr 
Tegyg1, l'eg or imm, $softint 


Teg,g1, l'eg Or imm, Stick cmpr 
VS rg, l'eg or imm, $stickt 


Teg,g1, leg or imm, sstick_cmprt 


Class 
D1 


A1 


A1 


A1 


A1 


A1 


A1 
N1 


N1 


N1 


N1 
N1 
N1 
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WRasr 





Instruction 


rd Operation Assembly Language Syntax Class 


29-31 Implementation dependent (impl. 
dep. #8-V8-Cs20, 9-V8-Cs20) 


t The original assembly language names for stick and $stick cmpr were, respectively, $sys_tick and $sys tick cmpr, which are 


now deprecated. Over time, assemblers will support the new stick and $stick cmpr names for these registers (which are consistent 


with stick and $tick cmpr). In the meantime, some existing assemblers may only recognize the original names. 


10 





v CE ye py 1—* 


31 30 29 


Description 


25 24 19 18 14 13 12 5 4 0 


The WRasr instructions each store a value to the writable fields of the ancillary state 
register (ASR) specified by rd. 


The value stored by these instructions (other than the implementation-dependent 
variants) is as follows: if i = 0, store the value "R[rs1] xor R[rs2]"; if i = 1, store 
^R[rs1] xor sign. ext (simm13)". 


Note | The operation is exclusive-or. 


The WRasr instruction with rs1 = 0 is a (deprecated) WRY instruction (which should 
not be used in new software). WRY is not a delayed-write instruction; the instruction 
immediately following a WRY observes the new value of the Y register. 


The WRY instruction is deprecated. It is recommended that all instructions that 
reference the Y register be avoided. 





WRCCR, WRFPRS, and WRASI are not delayed-write instructions. The instruction 
immediately following a WRCCR, WRFPRS, or WRASI observes the new value of 
the CCR, FPRS, or ASI register. 


WRFPRS waits for any pending floating-point operations to complete before writing 
the FPRS register. 


IMPL. DEP. #48-V8-Cs20: WRasr instructions with rd in the range 26-31 are 

available for implementation-dependent uses (impl. dep. #8-V8-Cs20). For a WRasr 

instruction with rd in the range 26-31, the following are implementation dependent: 

m the interpretation of bits 18:0 in the instruction 

m the operation(s) performed (for example, xor) to generate the value written to the 
ASR 

m whether the instruction is nonprivileged or privileged or hyperprivileged (impl. 
dep. #9-V8-Cs20), and 

m whether an attempt to execute the instruction causes an illegal instruction 
exception. 
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Note | See the section “Read/Write Ancillary State Registers (ASRs)” in 
Extending the UltraSPARC Architecture, contained in the separate 
volume UltraSPARC Architecture Application Notes, for a 
discussion of extending the SPARC V9 instruction set by means of 
read/write ASR instructions. 


V9 | Ancillary state registers may include (for example) timer, counter, 
Compatibility | diagnostic, self-test, and trap-control registers. 

Notes | The SPARC V8 WRIER, WRPSR, WRWIM, and WRTBR 
instructions do not exist in the UltraSPARC Architecture because 
the IER, PSR, TBR, and WIM registers do not exist in the 
UltraSPARC Architecture. 





See Ancillary State Registers on page 70 for more detailed information regarding ASR 
registers. 


Exceptions. An attempt to execute a WRasr instruction when any of the following 
conditions exist causes an illegal_instruction exception: 


i = 0 and instruction bits 12:5 are nonzero 

rd = 1, 4, 5, 7-14, 18, or 26-31 

rd = 15 and ((rs1 #0) or (i = 0)) 

the instruction is WRSTICK and the virtual processor is not in hyperprivileged 
mode (HPSTATE.hpriv = 0) 


An attempt to execute a WRPCR (impl. dep. #250-U3-Cs10), WRSOFTINT. SET, 
WRSOFTINT CLR, WRSOFTINT, WRTICK CMPR, or WRSTICK_CMPR instruction 
in nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0) causes a 
privileged opcode exception. 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a WRGSR instruction causes an 
fp disabled exception. 


An attempt to execute a WRPIC instruction in nonprivileged mode (PSTATE.priv = 0 
and HPSTATE.hpriv = 0) when PCR.priv = 1 causes a privileged action exception. 


Implementation | The SIR instruction shares an opcode with WRasr; they are 


Note | distinguished by the rd, rs1, and i fields (rd = 15,rs1 20, andi z 1 
for SIR). See Software-Initiated Reset on page 326. 


Exceptions illegal instruction 
privileged opcode 
fo disabled 
privileged action 
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See Also RDasr on page 303 
WRHPR on page 380 
WRPR on page 382 
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WRHPR 





7.112 Write Hyperprivileged Register 


Instruction  op3 Operation rd Assembly Language Syntax Class 
WRHPR 110011 Write hyperprivileged register N1 

HPSTATE 0 wrhpr /eg;g;, leg or imm, $hpstate 

HTSTATE 1 wrhpr /eg;g;, leg or imm, $htstate 

Reserved 2 

HINTP 3 wrhpr /egyg,, leg or imm, S$hintp 

Reserved 4 

HTBA 5 wrhpr legrg,, leg or imm, $htba 

Reserved 6-29 

Reserved 30 

HSTICK_CMPR 31 wrhpr /eg;g;, leg or imm, %hsys_tick_cmpr 











ICI GIG M ———I-9 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description A WRHPR instruction stores the value “R[rs1] xor R[rs2]" if i= 0, or "R[rs1] xor 
sign. ext (simm13)" if i = 1 to the writable fields of the specified hyperprivileged 
state register. 


Note | The operation is exclusive-or. 


The rd field in the instruction determines the hyperprivileged register that is written. 
There are MAXTL copies of the HTSTATE register, one for each trap level. A write to 
one of these registers sets the copy of HTSTATE indexed by the current value in the 
trap-level register (TL). 


The WRHPR instruction is a non-delayed-write instruction. The instruction 
immediately following the WRHPR observes any changes made to virtual processor 
state made by the WRHPR. 


An attempt to execute a WRHPR instruction when any of the following conditions 
exist causes an illegal instruction exception: 

m i=0 and instruction bits 12:5 are nonzero 

m rd —2, 4, or 6-30 (reserved for future versions of the architecture) 

m rd= 1 and TL = 0 (write to HTSTATE when the trap level is zero) 

m virtual processor is in nonprivileged or privileged mode (HPSTATE.hpriv = 0) 


A trap level zero disrupting trap can occur upon the completion of a WRHPR 
instruction to HPSTATE, if the following three conditions are true after WRHPR has 
executed: 
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a trap level zero exceptions are enabled (HPSTATE.tlz = 1), 

a the virtual processor is in nonprivileged or privileged mode 
(HPSTATE.hpriv = 0), and 

a the trap level (TL) register's value is zero (TL = 0) 


Programming | Execution of a WRHPR instruction that causes the value of 
Note | HPSTATE.hpriv to change from 1 to 0 is not guaranteed to work 
if the WRHPR is in the delay slot of a DCTI instruction. 
Therefore, it is recommended that WRHPR not be executed in a 
delay slot, especially if it will toggle the value of HPSTATE.hpriv 
to 0. 


Programming | For historical reasons, the WRPR instruction, not WRHPR, is used 
Note | to write to the hyperprivileged TICK register. See Write Privileged 
Register on page 382. 


Exceptions illegal instruction 
trap level zero 


See Also RDHPR on page 306 
WRasr on page 376 
WRPR on page 382 
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WRPR 





7.113 Write Privileged Register 


Instruction op3 Operation rd Assembly Language Syntax Class 
WRPR? 110010 Write Privileged register Al 
TPC 0 wrpr Teégrg,, leg or imm, %tpc 
TNPC 1 wrpr Teégrg,, l'eg or imm, $tnpc 
TSTATE 2 wrpr Teégrg,, leg or imm, %tstate 
TT 3 wrpr VESrstr leg or imm, Stt 
TICK 4 wrpr Teégrg,, leg or imm, Stick 
TBA 5 wrpr VESrstr leg or imm, %Stba 
PSTATE 6 wrpr Tegrg,, leg or imm, %pstate 
TL 7. wrpr Tegyg1, leg or imm, %tl 
PIL 8 wrpr VESrstr leg or imm, %pil 
CWP 9 wrpr Teégrg, leg or imm, $cwp 
CANSAVE 10 wrpr reSrgtr leg or imm, $cansave 
CANRESTORE 11 wrpr TESrstr leg or imm, %canrestore 
CLEANWIN 12 wrpr reSrgq, leg or imm, %cleanwin 
OTHERWIN 13 wrpr VESrstr leg or imm, %otherwin 
WSTATE 14 wrpr reSrgt, leg or imm, $wstate 
Reserved 15 
GL 16 wrpr Tégrg,, leg or imm, %gl 





Reserved 17-31 


mI I om ) es Ur. 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description This instruction stores the value “R[rs1] xor R[rs2]" if i = 0, or "R[rs1] xor 
sign ext (simm13)" if i = 1 to the writable fields of the specified privileged state 
register. 


Note | The operation is exclusive-or. 


The rd field in the instruction determines the privileged register that is written. 
There are MAXTL copies of the TPC, TNPC, TT, and TSTATE registers, one for each 
trap level. A write to one of these registers sets the register, indexed by the current 
value in the trap-level register (TL). 
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A WRPR to TL only stores a value to TL; it does not cause a trap, cause a return from 
a trap, or alter any machine state other than TL and state (such as PC, NPC, TICK, 
etc.) that is indirectly modified by every instruction. 


Programming | A WRPR of TL can be used to read the values of TPC, TNPC, and 
Note TSTATE for any trap level; however, software must take care that 
traps do not occur while the TL register is modified. 


The WRPR instruction is a non-delayed-write instruction. The instruction 
immediately following the WRPR observes any changes made to virtual processor 
state made by the WRPR. 


In privileged mode, MAxPTL is the maximum value that may be written by a WRPR to 
TL; an attempt to write a larger value results in MAXPTL being written to TL. In 
hyperprivileged mode, MAXTL is the maximum value that may be written by a WRPR 
to TL; an attempt to write a larger value results in MAXTL being written to TL. For 
details, see TABLE 5-22 on page 101. 


In privileged mode, MAXPGL is the maximum value that may be written by a WRPR to 
GL; an attempt to write a larger value results in MAxPGL being written to GL. In 
hyperprivileged mode, MAXGL is the maximum value that may be written by a WRPR 
to GL; an attempt to write a larger value results in MAXGL being written to GL. For 
details, see TABLE 5-23 on page 104. 


Programming | For historical reasons, the WRPR instruction, not WRHPR, is used 
Note | to write to the hyperprivileged TICK register. 


Exceptions. An attempt to execute a WRPR instruction in nonprivileged mode 
(PSTATE.priv = 0 and HSTATE.hpriv = 0) causes a privileged opcode exception. 


An attempt to execute a WRPR instruction when any of the following conditions 

exist causes an illegal instruction exception: 

m i=0 and instruction bits 12:5 are nonzero 

m (rd = 4) and (PSTATE.priv = 1 and HSTATE.hpriv = 0) 
(an attempt to write to hyperprivileged register TICK while in privileged mode) 

m rd = 15, or 17-31 (reserved for future versions of the architecture) 

m O<rd<3 (attempt to write TPC, TNPC,TSTATE, or TT register) while TL = 0 
(current trap level is zero) and the virtual processor is in privileged or 
hyperprivileged mode. 


Implementation | In nonprivileged mode, i//legal instruction exception due to 
Note | 0 < rd < 3 and TL = 0 does not occur; the privileged opcode 
exception occurs instead. 


A trap level zero disrupting trap can occur upon the completion of a WRPR 
instruction to TL, if the following three conditions are true after WRPR has executed: 
a trap_ level zero exceptions are enabled (HPSTATE.tlz = 1) 
a the virtual processor is in nonprivileged or privileged mode 
(HPSTATE.hpriv = 0), and 
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a the trap level (TL) register's value is zero (TL = 0) 


Exceptions privileged opcode 
illegal instruction 
trap level zero 


See Also RDPR on page 307 
WRasr on page 376 
WRHPR on page 380 
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7.114 XOR Logical Operation 


Instruction 
XOR 
XORcc 
XNOR 
XNORcc 


op3 

00 0011 
01 0011 
00 0111 
01 0111 


Operation Assembly Language Syntax 

Exclusive or xor TESrsir reg_or_imm, 
Exclusive or and modify cc's xorcc egy, leg or imm, 
Exclusive nor xnor Tegyg1, Teg or imm, 
Exclusive nor and modify cc's xnorcc Jegy, reg or imm, 


reS rg 
reS rg 
reS rd 
reS rg 


mI IGI ——— 
= sin 


31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 12 


5 4 


Class 


These instructions implement bitwise logical xor operations. They compute “R[rs1] 
op R[rs2]" if i 2 0, or "R[rs1] op sign ext (simm13)" if i = 1, and write the result into 
R[rd]. 


XORcc and XNORcc modify the integer condition codes (icc and xcc). They set the 
condition codes as follows: 


iCC.V, iCC.C, XCC.V, and xcc.c are set to 0 
icc.n is copied from bit 31 of the result 
xcc.n is copied from bit 63 of the result 


icc.z is set to 1 if bits 31:0 of the result are zero (otherwise to 0) 


XCC.Z is set to 1 if all 64 bits of the result are zero (otherwise to 0) 


Programming | XNOR (and XNORcc) is identical to the xor. not (and set condition 


Note | codes) xor not cc logical operation, respectively. 


An attempt to execute an XOR, XORcc, XNOR, or XNORcc instruction when i = 0 and 
instruction bits 12:5 are nonzero causes an illegal instruction exception. 


illegal instruction 
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CHAPTER 8 


IEEE Std 754-1985 Requirements for 
UltraSPARC Architecture 2005 


The IEEE Std 754-1985 floating-point standard contains a number of implementation 
dependencies. This chapter specifies choices for these implementation dependencies, 
to ensure that SPARC V9 implementations are as consistent as possible. 


The chapter contains these major sections: 


Traps Inhibiting Results on page 387. 
Underflow Behavior on page 388. 

Integer Overflow Definition on page 389. 
Floating-Point Nonstandard Mode on page 390. 
Arithmetic Result Tables on page 390. 


Exceptions are discussed in this chapter on the assumption that instructions are 
implemented in hardware. If an instruction is implemented in software, it may not 
trigger hardware exceptions but its behavior as observed by nonprivileged software 
(other than timing) must be the same as if it was implemented in hardware. 





8.1 


Traps Inhibiting Results 


As described in Floating-Point State Register (FSR) on page 61 and elsewhere, when a 
floating-point trap occurs, the following conditions are true: 


The destination floating-point register(s) (the F registers) are unchanged. 
The floating-point condition codes (£cc0, £cc1, £cc2, and £cc3) are unchanged. 
The FSR.aexc (accrued exceptions) field is unchanged. 


The FSR.cexc (current exceptions) field is unchanged except for 
IEEE 754 exceptions; in that case, cexc contains a bit set to 1, corresponding to 
the exception that caused the trap. Only one bit shall be set in cexc. 


387 


Instructions causing an fp. exception other trap because of unfinished or 
unimplemented FPops execute as if by hardware; that is, such a trap is undetectable 
by application software, except that timing may be affected. 


Programming | A user-mode trap handler invoked for an IEEE 754 exception, 

Note | whether as a direct result of a hardware fp exception ieee 754 
trap or as an indirect result of privileged software handling of 
an fp exception other trap with FSR.ftt = unfinished. FPop or 
FSRftt = unimplemented FPop, can rely on the following 
behavior: 


m The address of the instruction that caused the exception will 
be available. 


m The destination floating-point register(s) are unchanged from 
their state prior to that instruction's execution. 


m The floating-point condition codes (£cc0, £cc1, £cc2, and 
£cc3) are unchanged. 


m The FSR.aexc field is unchanged. 


m The FSR.cexc field contains exactly one bit set to 1, 
corresponding to the exception that caused the trap. 


m The FSR.ftt, FSR.qne, and reserved fields of FSR are zero. 





Dl 
An UltraSPARC Architecture virtual processor detects tininess before rounding 


occurs. (impl. dep. #55-V8-Cs10) 


TABLE 8-1 summarizes what happens when an exact unrounded value u satisfying 


0 € lul € smallest normalized number 


would round, if no trap intervened, to a rounded value r which might be zero, 
subnormal, or the smallest normalized value. 
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8.2.1 


8.2.2 


TABLE 8-1 Floating-Point Underflow Behavior (Tininess Detected Before Rounding) 








Underflow trap: |ufm = 1 ufm z 0 ufm z 0 

Inexact trap: |nxm = x nxm z 1 nxm -0 
r is minimum normal None None None 
u-r |ris subnormal UF None None 
r is zero None None None 


r is minimum normal 


u#r |ris subnormal 











ris zero 


UF = fp exception ieee 754 trap with cexc.ufc = 1 

NX = fp exception ieee 754 trap with cexc.nxc = 1 
uf = cexc.ufc = 1, aexc.ufa = 1, no fp exception ieee 754 trap 
nx = cexc.nxc = 1, aexc.nxa = 1, no fp. exception ieee 754 trap 














Trapped Underflow Definition (ufm = 1) 


Since tininess is detected before rounding, trapped underflow occurs when the exact 
unrounded result has magnitude between zero and the smallest normalized number 
in the destination format. 


Note | The wrapped exponent results intended to be delivered on 
trapped underflows and overflows in IEEE 754 are irrelevant to 
the UltraSPARC Architecture at the hardware, hyperprivileged, 
and privileged software levels. If they are created at all, it 
would be by user software in a nonprivileged-mode trap 
handler. 


Untrapped Underflow Definition (ufm = 0) 


Untrapped underflow occurs when the exact unrounded result has magnitude 
between zero and the smallest normalized number in the destination format and the 
correctly rounded result in the destination format is inexact. 





6.3 


Integer Overflow Definition 


m F<sdq>TOi — When a NaN, infinity, large positive argument > 231 or large 
negative argument € =(291 + 1) is converted to an integer, the invalid_current 
(nvc) bit of FSR.cexc is set to 1, and if the floating-point invalid trap is enabled 
(FSR.tem.nvm - 1), the fp exception IEEE 754 exception is raised. If the 
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floating-point invalid trap is disabled (FSR.tem.nvm = 0), no trap occurs and a 
numerical result is generated: if the sign bit of the operand is 0, the result is 23t — 
1; if the sign bit of the operand is 1, the result is — I 


= F<sdq>TOx — When a NaN, infinity, large positive argument > 2°, or large 


negative argument < -(29? + 1) is converted to an extended integer, the 

invalid. current (nvc) bit of FSR.cexc is set to 1, and if the floating-point invalid 
trap is enabled (FSR.tem.nvm - 1), the fp exception IEEE 754 exception is 
raised. If the floating-point invalid trap is disabled (FSR.tem.nvm = 0), no trap 
occurs and a numerical result is generated: if the sign bit of the operand is 0, the 
result is 2° — 1; if the sign bit of the operand is 1, the result is — E 





8.4 


Floating-Point Nonstandard Mode 


On an UltraSPARC Architecture 2005 processor, all floating-point operations 
produce results that conform to IEEE Std. 754, regardless of the setting of the 
“nonstandard mode" bit, FSR.ns (impl. dep. #18-V8) 





6.5 


Arithmetic Result Tables 


This section contains detailed tables, showing the results produced by various 

floating-point operations, depending on their source operands. 

Notes on source types: 

m Nn is a number in F[rsr], which may be normal or subnormal. 

m QNaNn and SNaNn are Quiet and Signaling Not-a-Number values in F[rsr], 
respectively. 

Notes on result types: 


m R: (rounded) result of operation, which may be normal, subnormal, zero, or 
infinity. May also cause OF, UF, NX, unfinished. 


m dQNaN is the generated default Quiet NaN (sign = 0, exponent = all 1s, 
fraction = all 1s). The sign of the default Quiet NaN is zero to distinguish it from 
storage initialized to all ones. 


m OSNaNn is the Signalling NaN operand from F[rsn] with the Quiet bit asserted 
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8.5.1 Floating-Point Add (FADD) 


TABLE 8-2 Floating-Point Add operation (F[rs1] + F[rs2]) 


F[rs2] 


Ran E2229 ER E 3 ECC 














ONaN2 
QSNaN2, 
= NV 
QSNaN1, 
NV 


* if N1 =-N2, then ** 


+ 


result is +0 unless rounding mode is round to —°°, in which case the result is —0 


For the FADD instructions, R may be any number; its generation may cause OF, UF, 
and/or NX. 


Floating-point add is not commutative when both operands are NaN. 
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8.5.2 Floating-Point Subtract (FSUB) 


TABLE 8-3 Floating-Point Subtract operation (F[rs1] — F[rs2]) 


F[rs2] 








QSNaN2, 


F[rs1] NV 

















QNaN1 








QSNaNI, 
NV 











* if N1 = N2, then ** 


** result is +0 unless rounding mode is round to —co, in which case the result is —0 


For the FSUB instructions, R may be any number; its generation may cause OF, UF, 
and/or NX. 


Note that -x # 0—-x when x is zero or NaN. 


8.5.3 Floating-Point Multiply 


TABLE 8-4 Floating-Point Multiply operation (F[rs1] x F[rs2]) 
| Firs2] 


















QNaN2 








QSNaN2, 
+R NV 








F[rs1] 
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R may be any number; its generation may cause OF, UF, and/or NX. 
Floating-point multiply is not commutative when both operands are NaN. 
FsMULd (FdMULq) never causes OF, UF, or NX. 


A NaN input operand to FFMULd (FdMULq) must be widened to produce a double- 
precision (quad-precision) NaN output, by filling the least-significant bits of the 
NaN result with zeros. 


8.5.4 Floating-Point Divide (FDIV) 


TABLE 8-5 Floating-Point Divide operation (F[rs1] + F[rs2]) 


| F[rs2] 








—oo -N2 —0 +0 + N2 +00 SNaN2 






































R may be any number; its generation may cause OF, UF, and/or NX. 


8.5.5 Floating-Point Square Root (FSORT) 


TABLE 8-6 Floating-Point Square Root operation (./F[rs2] ) 


F[rs2] 


e| m| o| o | ne | em | ananz | sur | 





dQNaN, +R QNaN2 | QSNaN2, 
NV NV 
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R may be any number; its generation may cause NX. 


Square root cannot cause DZ, OF, or UF. 


8.5.6 Floating-Point Compare (FCMP, FCMPE) 


TABLE 8-7 Floating-Point Compare (FCMP, FCMPE) operation (F[rs1] ? F[rs2]) 








First], +1 


+00 


QNaN1 








SNaN1 








* NV for FCMPE, but not for FCMP. 


TABLE 8-8 FSR.fcc Encoding for Result of FCMP, FCMPE 


fcc result meaning 
0 = 
1 < 
2 > 
3 unordered 


NaN is considered to be unequal to anything else, even the identical NaN bit 
pattern. 


FCMP/FCMPE cannot cause DZ, OF, UF, NX. 
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8.5.7 Floating-Point to Floating-Point Conversions 
(F«sId Iq» TOcsldlq») 


TABLE 8-9 Floating-Point to Float-Point Conversions (convert(F[rs2])) 




















Fjrs2] 

-SNaN2 | -QNaN2 —oo -N2 -0 +0 +N2 400 +QNaN2 | +SNaN2 
-QSNaN2,|-QNaN2| ~œ R 0 +0 +R œo |-QONaN2 | +QSNaN?, 
NV NV 

For FsTOd: 


m the least-significant fraction bits of a normal number are filled with zero to fit in 
double-precision format 

m the least-significant bits of a NaN result operand are filled with zero to fit in 
double-precision format 


For FsTOq and FdTOq: 


m the least-significant fraction bits of a normal number are filled with zero to fit in 
quad-precision format 

m the least-significant bits of a NaN result operand are filled with zero to fit in 
quad-precision format 


For FqTOs and FdTOs: 


m the fraction is rounded according to the current rounding mode 

m the lower-order bits of a NaN source are discarded to fit in single-precision 
format; this discarding is not considered a rounding operation, and will not cause 
an NX exception 


For FqTOd: 


m the fraction is rounded according to the current rounding mode 

m the least-significant bits of a NaN source are discarded to fit in double-precision 
format; this discarding is not considered a rounding operation, and will not cause 
an NX exception 
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TABLE 8-10 Floating-Point to Float-Point Conversion Exception Conditions 


NV | ¢ SNaN operand 


OF | ¢ FdTOs, FqTOs: the input is larger than can be expressed in single precision 
* FqTOd: the input is larger than can be expressed in double precision 
* does not occur during other conversion operations 








UF | ¢ FdTOs, FqTOs: the input is smaller than can be expressed in single precision 
* FqTOd: the input is smaller than can be expressed in double precision 
* does not occur during other conversion operations 





NX |° FdTOs, FqTOs: the input fraction has more significant bits than can be held in a 
single precision fraction 
* FqTOd: the input fraction has more significant bits than can be held in a double 
precision fraction 
* does not occur during other conversion operations 


8.5.8 Floating-Point to Integer Conversions 
(F<s|d1q>TO<il x>) 


TABLE 8-11 Floating-Point to Integer Conversions (convert(F[rs2])) 


F[rs2] 











R may be any integer, and may cause NV, NX. 
Float-to-Integer conversions are always treated as round-toward-zero (truncated). 


These operations are invalid (due to integer overflow) under the conditions 
described in Integer Overflow Definition on page 389. 


TABLE 8-12 Floating-point to Integer Conversion Exception Conditions 


NV | ¢ SNaN operand 
QNaN operand 
too operand 
integer overflow 





NX | * non-integer source (truncation occurred) 
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8.5.9 Integer to Floating-Point Conversions 
(F<ilx>TO<s1d1q>) 


TABLE 8-13 Integer to Floating-Point Conversions 
(convert(F[rs2])) 


| F[rs2] 





-int 0 +int 


-R +0 +R 














R may be any number; its generation may cause NX. 


TABLE 8-14 Floating-Point Conversion Exception Conditions 





NX | * FxTOd, FxTOs, FiTOs (possible loss of precision) 
* not applicable to FiTOd, FxTOq, or FiTOq (FSR.cexc will 
always be cleared) 
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CHAPTER 9 


Memory 





The UltraSPARC Architecture memory models define the semantics of memory 
operations. The instruction set semantics require that loads and stores behave as if 
they are performed in the order in which they appear in the dynamic control flow of 
the program. The actual order in which they are processed by the memory may be 
different. The purpose of the memory models is to specify what constraints, if any, 
are placed on the order of memory operations. 


The memory models apply both to uniprocessor and to shared memory 
multiprocessors. Formal memory models are necessary for precise definitions of the 
interactions between multiple virtual processors and input/output devices in a 
shared memory configuration. Programming shared memory multiprocessors 
requires a detailed understanding of the operative memory model and the ability to 
specify memory operations at a low level in order to build programs that can safely 
and reliably coordinate their activities. For additional information on the use of the 
models in programming real systems, see Programming with the Memory Models, 
contained in the separate volume UltraSPARC Architecture Application Notes. 


This chapter contains a great deal of theoretical information so that the discussion of 
the UltraSPARC Architecture TSO memory model has sufficient background. 


This chapter describes memory models in these sections: 


Memory Location Identification on page 400. 

Memory Accesses and Cacheability on page 400. 

Memory Addressing and Alternate Address Spaces on page 403. 
SPARC V9 Memory Model on page 407. 

The UltraSPARC Architecture Memory Model — TSO on page 410. 
Nonfaulting Load on page 419. 

Store Coalescing on page 420. 
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9.1 


Memory Location Identification 


A memory location is identified by an 8-bit address space identifier (ASI) and a 64- 
bit memory address. The 8-bit ASI can be obtained from an ASI register or included 
in a memory access instruction. The ASI used for an access can distinguish among 
different 64-bit address spaces, such as Primary memory space, Secondary memory 
space, and internal control registers. It can also apply attributes to the access, such as 
whether the access should be performed in big- or little-endian byte order, or 
whether the address should be taken as a virtual, real, or physical address. 





9.2 


9.2.1 


Memory Accesses and Cacheability 


Memory is logically divided into real memory (cached) and 1/0 memory 
(noncached with and without side effects) spaces. 


Real memory stores information without side effects. A load operation returns the 
value most recently stored. Operations are side-effect-free in the sense that a load, 
store, or atomic load-store to a location in real memory has no program-observable 
effect, except upon that location (or, in the case of a load or load-store, on the 
destination register). 


I/O locations may not behave like memory and may have side effects. Load, store, 
and atomic load-store operations performed on I/O locations may have observable 
side effects, and loads may not return the value most recently stored. The value 
semantics of operations on I/O locations are not defined by the memory models, but 
the constraints on the order in which operations are performed is the same as it 
would be if the I/O locations were real memory. The storage properties, contents, 
semantics, ASI assignments, and addresses of 1/0 registers are implementation 
dependent. 


Coherence Domains 


Two types of memory operations are supported in the UltraSPARC Architecture: 
cacheable and noncacheable accesses. The manner in which addresses are 
differentiated is implementation dependent. In some implementations, it is indicated 
in the page translation entry (TTE.cp), while in other implementations, it is 
indicated by a bit in the physical address. 


400 UltraSPARC Architecture 2005 * Draft D0.9.2, 19 Jun 2008 


Although SPARC V9 does not specify memory ordering between cacheable and 
noncacheable accesses, the UltraSPARC Architecture maintains TSO ordering 
between memory references regardless of their cacheability. 


The UltraSPARC Architecture obeys the Sun-5 Ordering rules as documented in the 
“Sun-4u/Sun-5 Ordering with TSO” specification. 


9.2.1.1 Cacheable Accesses 


Accesses within the coherence domain are called cacheable accesses. They have these 
properties: 

m Data reside in real memory locations. 

m Accesses observe supported cache coherency protocol(s). 

m The cache line size is 2" bytes (where n > 4), and can be different for each cache. 


9.2.1.2. Noncacheable Accesses 


Noncacheable accesses are outside of the coherence domain. They have the 
following properties: 


m Data might not reside in real memory locations. Accesses may result in 
programmer-visible side effects. An example is memory-mapped I/O control 
registers. 

m Accesses do not observe supported cache coherency protocol(s). 

m The smallest unit in each transaction is a single byte. 


The UltraSPARC Architecture MMU optionally includes an attribute bit in each page 
translation, TTE.e, which when set signifies that this page has side effects. 


Noncacheable accesses without side effects (TTE.e = 0) are processor-consistent and 
obey TSO memory ordering. In particular, processor consistency ensures that a 
noncacheable load that references the same location as a previous noncacheable store 
will load the data from the previous store. 


Noncacheable accesses with side effects (TTE.e = 1) are processor consistent and are 
strongly ordered. These accesses are described in more detail in the following 
section. 


9.2.1.3 Noncacheable Accesses with Side-Effect 


Loads, stores, and load-stores to I/O locations might not behave with memory 
semantics. Loads and stores could have side effects; for example, a read access could 
clear a register or pop an entry off a FIFO. A write access could set a register address 
port so that the next access to that address will read or write a particular internal 
register. Such devices are considered order sensitive. Also, such devices may only 
allow accesses of a fixed size, so store merging of adjacent stores or stores within a 
16-byte region would cause an error (see Store Coalescing on page 420). 
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Noncacheable accesses (other than block loads and block stores) to pages with side 
effects (TTE.e = 1) exhibit the following behavior: 


m Noncacheable accesses are strongly ordered with respect to each other. Bus 
protocol should guarantee that IO transactions to the same device are delivered in 
the order that they are received. 


m Noncacheable loads with the TTE.e bit = 1 will not be issued to the system until 
all previous instructions have completed, and the store queue is empty. 
m Noncacheable store coalescing is disabled for accesses with TTE.e - 1. 


a A MEMBAR may be needed between side-effect and non-side-effect accesses. See 
TABLE 9-3 on page 417. 


Whether block loads and block stores adhere to the above behavior or ignore TTE.e 
and always behave as if TTE.e = 0 is implementation-dependent (impl. dep. #410- 
510, #411-S10). 


On UltraSPARC Architecture virtual processors, noncacheable and side-effect 
accesses do not observe supported cache coherency protocols (impl. dep. #120). 





Non-faulting loads (using ASI PRIMARY NO FAULT[ LITTLE] or 
ASI SECONDARY NO FAULT[ LITTLE]) with the TTE.e bit = 1 cause a trap. 








Prefetches to noncacheable addresses result in nops. 


The processor does speculative instruction memory accesses and follows branches 
that it predicts are taken. Instruction addresses mapped by the MMU can be 
accessed even though they are not actually executed by the program. Normally, 
locations with side effects or that generate timeouts or bus errors are not mapped as 
instruction addresses by the MMU, so these speculative accesses will not cause 
problems. 


IMPL. DEP. #118-V9: The manner in which I/O locations are identified is 
implementation dependent. 


IMPL. DEP. #120-V9: The coherence and atomicity of memory operations between 
virtual processors and I/O DMA memory accesses are implementation dependent. 


V9 Compatibility | Operations to I/O locations are not guaranteed to be 
Note | sequentially consistent among themselves, as they are in SPARC 
V8. 


Systems supporting SPARC V8 applications that use memory-mapped I/O locations 
must ensure that SPARC V8 sequential consistency of I/O locations can be 
maintained when those locations are referenced by a SPARC V8 application. The 
MMU either must enforce such consistency or cooperate with system software or the 
virtual processor to provide it. 


IMPL. DEP. #121-V9: An implementation may choose to identify certain addresses 
and use an implementation-dependent memory model for references to them. 
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9.3 


9.3.1 


Memory Addressing and Alternate 
Address Spaces 


An address in SPARC V9 is a tuple consisting of an 8-bit address space identifier 
(ASI) and a 64-bit byte-address offset within the specified address space. Memory is 
byte-addressed, with halfword accesses aligned on 2-byte boundaries, word accesses 
(which include instruction fetches) aligned on 4-byte boundaries, extended-word 
and doubleword accesses aligned on 8-byte boundaries, and quadword quantities 
aligned on 16-byte boundaries. With the possible exception of the cases described in 
Memory Alignment Restrictions on page 116, an improperly aligned address in a load, 
store, or load-store instruction always causes a trap to occur. The largest datum that 
is guaranteed to be atomically read or written is an aligned doubleword!. Also, 
memory references to different bytes, halfwords, and words in a given doubleword 
are treated for ordering purposes as references to the same location. Thus, the unit of 
ordering for memory is a doubleword. 


Notes | The doubleword is the coherency unit for update, but 
programmers should not assume that doubleword floating-point 
values are updated as a unit unless they are doubleword-aligned 
and always updated with double-precision loads and stores. 
Some programs use pairs of single-precision operations to load 
and store double-precision floating-point values when the 
compiler cannot determine that they are doubleword aligned. 


Also, although quad-precision operations are defined in the 
SPARC V9 architecture, the granularity of loads and stores for 
quad-precision floating-point values may be word or 
doubleword. 





Memory Addressing Types 


The UltraSPARC Architecture supports the following types of memory addressing: 


Virtual Addresses (VA). Virtual addresses are addresses produced by a virtual 
processor that maps all systemwide, program-visible memory. Virtual addresses are 
translated by the MMU in order to locate data in physical memory. Virtual addresses 
can be presented in nonprivileged mode and privileged mode, or in hyperprivileged 
mode using the ASI_AS_IF_USER* ASI variants. 





1- Two exceptions to this are the special ASI_TWIN_DW_NUCLEUS [_L] and ASI_TWINX_REAL[_L] which 
provide hardware support for an atomic quad load to be used for TTE loads from TSBs. 
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9:3:2 


Real addresses (RA). A real address is provided to privileged software to 
describe the underlying physical memory allocated to it. Translation storage buffers 
(TSBs) maintained by privileged software are used to translate privileged or 
nonprivileged mode virtual addresses into real addresses. MMU bypass addresses in 
privileged mode are also real addresses. 


Physical addresses (PA). A physical address is one that appears on the system 
bus and is the same as the physical addresses in legacy architectures. 
Hyperprivileged software is responsible for managing the translation of real 
addresses into physical addresses. 


Nonprivileged software only uses virtual addresses. Privileged software uses virtual 
and real addresses. Hyperprivileged software uses physical addresses, except when 
the explicit ASI_AS_IF_USER* or ASI_*REAL* ASI variants are used for load and 
store alternate instructions. 








Memory Address Spaces 


The UltraSPARC Architecture supports accessing memory using virtual, real, or 
physical addresses. Multiple virtual address spaces within the same real address 
space are distinguished by a context identifier (context ID). Multiple real address 
spaces within the same physical address space are distinguished by a partition 
identifier (partition ID). 


Privileged software can create multiple virtual address spaces, using the primary 
and secondary context registers to associate a context ID with every virtual address. 
Privileged software manages the allocation of context IDs. 


Hyperprivileged software can create multiple real address spaces, using the partition 
register to associate a partition ID with every real address. Hyperprivileged software 
manages the allocation of partition IDs. 


IMPL. DEP. #___ The number of bits in the partition register is implementation 
dependent. 
The full representation of each type of address is as follows: 

real address = context ID :: virtual address 


physical address = partition ID :: real address 
or 
physical_address = partition ID :: context ID :: virtual_address 
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Address Space Identifiers 


The virtual processor provides an address space identifier with every address. This 
ASI may serve several purposes: 


m To identify which of several distinguished address spaces the 64-bit address offset 
is addressing 


m To provide additional access control and attribute information, for example, to 
specify the endianness of the reference 


m To specify the address of an internal control register in the virtual processor, 
cache, or memory management hardware 


Memory management hardware can associate an independent 2°4-byte memory 
address space with each ASI. In practice, the three independent memory address 
spaces (contexts) created by the MMU are Primary, Secondary, and Nucleus. 


Programming | Independent address spaces, accessible through ASIs, make it 
Note | possible for system software to easily access the address space of 

faulting software when processing exceptions or to implement 
access to a client program's memory space by a server program. 


Alternate-space load, store, load-store and prefetch instructions specify an explicit 
ASI to use for their data access. The behavior of the access depends on the current 
privilege mode. 


Non-alternate space load, store, load-store, and prefetch instructions use an implicit 
ASI value that is determined by current virtual processor state (the current privilege 
mode, trap level (TL), and the value of the PSTATE.cle). Instruction fetches use an 
implicit ASI that depends only on the current mode and trap level. 


The architecturally specified ASIs are listed in Chapter 10, Address Space Identifiers 
(ASIs). The operation of each ASI in nonprivileged, privileged and hyperprivileged 
modes is indicated in TABLE 10-1 on page 423. 


Attempts by nonprivileged software (PSTATE.priv = 0 and HPSTATE.hpriv = 0) to 
access restricted ASIs (ASI bit 7 = 0) cause a privileged action exception. Attempts by 
privileged software (PSTATE.priv = 1 and HPSTATE.hpriv = 0) to access ASIs 3016- 
7F16 cause a privileged action exception. 


When TL = 0, normal accesses by the virtual processor to memory when fetching 
instructions and performing loads and stores implicitly specify ASI PRIMARY or 
ASI PRIMARY LITTLE, depending on the setting of PSTATE.cle. 





When TL = 1 or 2 (> 0 but € MAXPTL), the implicit ASI in privileged mode is: 
m for instruction fetches, ASI NUCLEUS 














m for loads and stores, ASI. NUCLEUS if PSTATE.cle = 0 or ASI. NUCLEUS. LITTLE 
if PSTATE.cle = 1 (impl. dep. #124-V9). 
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In hyperprivileged mode, all instruction fetches and loads and stores with implicit 
ASIs use a physical address, regardless of the value of TL. 





SPARC V9 supports the PRIMARY[ LITTLE], SECONDARY[ LITTLE], and 
NUCLEUS[ LITTLE] address spaces. 























Accesses to other address spaces use the load /store alternate instructions. For these 
accesses, the ASI is either contained in the instruction (for the register+register 
addressing mode) or taken from the ASI register (for register-immediate 
addressing). 


ASIs are either nonrestricted, restricted-to-privileged, or restricted-to- 
hyperprivileged: 


m À nonrestricted ASI (ASI range 8046 — FF46) is one that may be used 
independently of the privilege level (PSTATE.privand HPSTATE.hpriv) at which 
the virtual processor is running. 


m A restricted-to-privileged ASI (ASI range 0046 — 2F46) requires that the virtual 
processor be in privileged or hyperprivileged mode for a legal access to occur. 


m A restricted-to-hyperprivileged ASI (ASI range 3046 — 7F46) requires that the 
virtual processor be in hyperprivileged mode for a legal access to occur. 


The relationship between virtual processor state and ASI restriction is shown in 
TABLE 9-1. 


TABLE9-1 Allowed Accesses to ASIs 


Result of ASI Result of ASI Result of AS 
ASI Value Type Access in NP Mode Access in P Mode Access in HP Mode 
0016 — Restricted-to- privileged action Valid Access Valid Access 
2F16 privileged exception 
3016-7F16  Restricted-to- privileged action privileged. action Valid Access 
hyperprivileged exception exception 
8016 - Nonrestricted Valid Access Valid Access Valid Access 


FF16 


Some restricted ASIs are provided as mandated by SPARC V9: 

ASI AS IF USER PRIMARY[ LITTLE]and 

ASI AS IF USER SECONDARY[ LITTLE]. The intent of these ASIs is to give 
privileged software efficient, yet secure access to the memory space of nonprivileged 
software. 




















The normal address space is primary address space, which is accessed by the 
unrestricted ASI PRIMARY[ LITTLE] ASIs. The secondary address space, which is 
accessed by the unrestricted ASI SECONDARY[ LITTLE] ASIs, is provided to allow 
server software to access client software's address space. 
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ASI PRIMARY NOFAULT[ LITTLE]and ASI SECONDARY NOFAULT[ LITTLE] 
support nonfaulting loads. These ASIs may be used to color (that is, distinguish into 
classes) loads in the instruction stream so that, in combination with a judicious 
mapping of low memory and a specialized trap handler, an optimizing compiler can 
move loads outside of conditional control structures. 





9.4 SPARC V9 Memory Model 


The SPARC V9 processor architecture specified the organization and structure of a 
central processing unit but did not specify a memory system architecture. This 
section summarizes the MMU support required by an UltraSPARC Architecture 
processor. 


The memory models specify the possible order relationships between memory- 
reference instructions issued by a virtual processor and the order and visibility of 
those instructions as seen by other virtual processors. The memory model is 
intimately intertwined with the program execution model for instructions. 


9.4.1 SPARC V9 Program Execution Model 


The SPARC V9 strand model of a virtual processor consists of three units: an Issue 
Unit, a Reorder Unit, and an Execute Unit, as shown in FIGURE 9-1. 


Processor 


Data Path 


Reorder Execute 
Unit Unit Instruction Path 























FIGURE 9-1 Processor Model: Uniprocessor System 


The Issue Unit reads instructions over the instruction path from memory and issues 
them in program order to the Reorder Unit. Program order is precisely the order 
determined by the control flow of the program and the instruction semantics, under 
the assumption that each instruction is performed independently and sequentially. 
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Issued instructions are collected and potentially reordered in the Reorder Unit, and 
then dispatched to the Execute Unit. Instruction reordering allows an 
implementation to perform some operations in parallel and to better allocate 
resources. The reordering of instructions is constrained to ensure that the results of 
program execution are the same as they would be if the instructions were performed 
in program order. This property is called processor self-consistency. 


Processor self-consistency requires that the result of execution, in the absence of any 
shared memory interaction with another virtual processor, be identical to the result 
that would be observed if the instructions were performed in program order. In the 
model in FIGURE 9-1, instructions are issued in program order and placed in the 
reorder buffer. The virtual processor is allowed to reorder instructions, provided it 
does not violate any of the data-flow constraints for registers or for memory. 


The data-flow order constraints for register reference instructions are these: 


1. An instruction that reads from or writes to a register cannot be performed until all 
earlier instructions that write to that register have been performed (read-after- 
write hazard; write-after-write hazard). 


2. An instruction cannot be performed that writes to a register until all earlier 
instructions that read that register have been performed (write-after-read hazard). 


V9 Compatibility | An implementation can avoid blocking instruction execution in 
Note | case 2 and the write-after-write hazard in case 1 by using a 
renaming mechanism that provides the old value of the register 
to earlier instructions and the new value to later uses. 


The data-flow order constraints for memory-reference instructions are those for 
register reference instructions, plus the following additional constraints: 


1. A memory-reference instruction that uses (loads or stores) the value at a location 
cannot be performed until all earlier memory-reference instructions that set (store 
to) that location have been performed (read-after-write hazard, write-after-write 
hazard). 


2. À memory-reference instruction that writes (stores to) a location cannot be 
performed until all previous instructions that read (load from) that location have 
been performed (write-after-read hazard). 


Memory-barrier instruction (MEMBAR) and the TSO memory model also constrain 
the issue of memory-reference instructions. See Memory Ordering and Synchronization 
on page 415 and The UltraSPARC Architecture Memory Model — TSO on page 410 for 
a detailed description. 


The constraints on instruction execution assert a partial ordering on the instructions 
in the reorder buffer. Every one of the several possible orderings is a legal execution 
ordering for the program. See Appendix D, Formal Specification of the Memory Models, 
for more information. 
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Virtual Processor/Memory Interface Model 


Each UltraSPARC Architecture virtual processor in a multiprocessor system is 
modeled as shown in FIGURE 9-2; that is, having two independent paths to memory: 
one for instructions and one for data. 

Memory Transactions 


Virtual Processors in Memory Order 


Instructions 
LC] ve 

Instructions 
[ [LL] | Date 


FIGURE 9-2 Data Memory Paths: Multiprocessor System 


























Instructions 
Data 








Data caches are maintained by hardware so their contents always appear to be 
consistent (coherent). Instruction caches are not required to be kept consistent with 
data caches and therefore require explicit program (software) action to ensure 
consistency when a program modifies an executing instruction stream. See 
Synchronizing Instruction and Data Memory on page 418 for details. Memory is shared 
in terms of address space, but it may be nonhomogeneous and distributed in an 
implementation. Mapping and caches are ignored in the model, since their functions 
are transparent to the memory model. 


In real systems, addresses may have attributes that the virtual processor must 
respect. The virtual processor executes loads, stores, and atomic load-stores in 
whatever order it chooses, as constrained by program order and the memory model. 
The ASI-address couples it generates are translated by a memory management unit 
(MMU), which associates attributes with the address and may, in some instances, 
abort the memory transaction and signal an exception to the virtual processor. 


For example, a region of memory may be marked as nonprefetchable, noncacheable, 
read-only, or restricted. It is the MMU’s responsibility, working in conjunction with 
system software, to ensure that memory attribute constraints are not violated. See 
implementation-specific MMU documentation for detailed information about how 
this is accomplished in each UltraSPARC Architecture implementation. 


1- The model described here is only a model; implementations of UltraSPARC Architecture systems are 
unconstrained as long as their observable behaviors match those of the model. 
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Instructions are performed in an order constrained by local dependencies. Using this 
dependency ordering, an execution unit submits one or more pending memory 
transactions to the memory. The memory performs transactions in memory order. The 
memory unit may perform transactions submitted to it out of order; hence, the 
execution unit must not concurrently submit two or more transactions that are 
required to be ordered, unless the memory unit can still guarantee in-order 
semantics. 


The memory accepts transactions, performs them, and then acknowledges their 
completion. Multiple memory operations may be in progress at any time and may be 
initiated in a nondeterministic fashion in any order, provided that all transactions to 
a location preserve the per-virtual processor partial orderings. Memory transactions 
may complete in any order. Once initiated, all memory operations are performed 
atomically: loads from one location all see the same value, and the result of stores is 
visible to all potential requestors at the same instant. 


The order of memory operations observed at a single location is a total order that 
preserves the partial orderings of each virtual processor's transactions to this 
address. There may be many legal total orders for a given program's execution. 





9.5 


The UltraSPARC Architecture Memory 
Model — TSO 


The UltraSPARC Architecture is a model that specifies the behavior observable by 
software on UltraSPARC Architecture systems. Therefore, access to memory can be 
implemented in any manner, as long as the behavior observed by software conforms 
to that of the models described here. 


The SPARC V? architecture defines three different memory models: Total Store Order 
(TSO), Partial Store Order (PSO), and Relaxed Memory Order (RMO). 


All SPARC V9 processors must provide Total Store Order (or a more strongly 
ordered model, for example, Sequential Consistency) to ensure compatibility for 
SPARC V8 application software. 


All UltraSPARC Architecture virtual processors implement TSO ordering. The PSO 
and RMO models from SPARC V9 are not described in this UltraSPARC Architecture 
specification. UltraSPARC Architecture 2005 processors do not implement the PSO 
memory model directly, but all software written to run under PSO will execute 
correctly on an UltraSPARC Architecture 2005 processor (using the TSO model). 


Whether memory models represented by PSTATE.mm = 10; or 11; are supported in 
an UltraSPARC Architecture processor is implementation dependent (impl. dep. 
#113-V9-Ms10). If the 10, model is supported, then when PSTATE.mm = 10; the 
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implementation must correctly execute software that adheres to the RMO model 
described in The SPARC Architecture Manual-Version 9. If the 11, model is supported, 
its definition is implementation dependent and will be described in implementation- 
specific documentation. 


Programs written for Relaxed Memory Order will work in both Partial Store Order 
and Total Store Order. Programs written for Partial Store Order will work in Total 
Store Order. Programs written for a weak model, such as RMO, may execute more 
quickly when run on hardware directly supporting that model, since the model 
exposes more scheduling opportunities, but use of that model may also require extra 
instructions to ensure synchronization. Multiprocessor programs written for a 
stronger model will behave unpredictably if run in a weaker model. 


Machines that implement sequential consistency (also called "strong ordering" or 
"strong consistency") automatically support programs written for TSO. Sequential 
consistency is not a SPARC V9 memory model. In sequential consistency, the loads, 
stores, and atomic load-stores of all virtual processors are performed by memory in 
a serial order that conforms to the order in which these instructions are issued by 
individual virtual processors. A machine that implements sequential consistency 
may deliver lower performance than an equivalent machine that implements TSO 
order. Although particular SPARC V9 implementations may support sequential 
consistency, portable software must not rely on the sequential consistency memory 
model. 


Memory Model Selection 


The active memory model is specified by the 2-bit value in PSTATE.mm,. The value 
005 represents the TSO memory model; increasing values of PSTATE.mm indicate 
increasingly weaker (less strongly ordered) memory models. 


Writing a new value into PSTATE.mm causes subsequent memory reference 
instructions to be performed with the order constraints of the specified memory 
model. 


IMPL. DEP. #119-Ms10: The effect of an attempt to write an unsupported memory 
model designation into PSTATE.mm is implementation dependent; however, it 
should never result in a value of PSTATE.mm value greater than the one that was 
written. In the case of an UltraSPARC Architecture implementation that only 
supports the TSO memory model, PSTATE.mm always reads as zero and attempts to 
write to it are ignored. 
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Programmer-Visible Properties of the UltraSPARC 
Architecture TSO Model 


Total Store Order must be provided for compatibility with existing SPARC V8 
programs. Programs that execute correctly in either RMO or PSO will execute 
correctly in the TSO model. 


The rules for TSO, in addition to those required for self-consistency (see page 408), 
are: 

m Loads are blocking and ordered with respect to earlier loads 

m Stores are ordered with respect to stores. 

m Atomic load-stores are ordered with respect to loads and stores. 


m Stores cannot bypass earlier loads. 


Programming | Loads can bypass earlier stores to other addresses, which 
Note | maintains processor self-consistency. 


Atomic load-stores are treated as both a load and a store and can only be applied to 
cacheable address spaces. 


Thus, TSO ensures the following behavior: 


m Each load instruction behaves as if it were followed by a MEMBAR #LoadLoad 
and #LoadStore. 


m Each store instruction behaves as if it were followed by a MEMBAR 
#StoreStore. 


m Each atomic load-store behaves as if it were followed by a MEMBAR #LoadLoad, 
#LoadStore, and #StoreStore. 


In addition to the above TSO rules, the following rules apply to UltraSPARC 
Architecture memory models: 


m À MEMBAR #StoreLoad must be used to prevent a load from bypassing a prior 
store, if Strong Sequential Order (as defined in The UltraSPARC Architecture 
Memory Model — TSO on page 410) is desired. 


m Accesses that have side effects are all strongly ordered with respect to each other. 


m AMEMBAR #Lookaside is not needed between a store and a subsequent load to 
the same noncacheable address. 


m Load (LDXA) and store (STXA) instructions that reference certain internal ASIs 
perform both an intra-virtual processor synchronization (i.e. an implicit 
MEMBAR #Sync operation before the load or store is executed) and an inter- 
virtual processor synchronization (that is, all active virtual processors are brought 
to a point where synchronization is possible, the load or store is executed, and all 
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virtual processors then resume instruction fetch and execution). The model- 
specific PRM should indicate which ASIs require intra-virtual processor 
synchronization, inter-virtual processor synchronization, or both. 


TSO Ordering Rules 


TABLE 9-2 summarizes the cases where a MEMBAR must be inserted between two 
memory operations on an UltraSPARC Architecture virtual processor running in 
TSO mode, to ensure that the operations appear to complete in a particular order. 
Memory operation ordering is not to be confused with processor consistency or 
deterministic operation; MEMBARSs are required for deterministic operation of 
certain ASI register updates. 


Programming | To ensure software portability across systems, the MEMBAR 
Note | rules in this section should be followed (which may be stronger 
than the rules in SPARC V9). 


TABLE 9-2 is to be read as follows: Reading from row to column, the first memory 
operation in program order in a row is followed by the memory operation found in 
the column. Symbols used as table entries: 


m #— No intervening operation is required. 


a M — an intervening MEMBAR #StoreLoad or MEMBAR #Sync or 
MEMBAR #MemIssue is required 


m S— an intervening MEMBAR #Sync or MEMBAR #MemIssue is required 
m nc — Noncacheable 
m e — Side effect 


m ne — No side effect 
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TABLE 9-2 Summary of UltraSPARC Architecture Ordering Rules (TSO Memory Model) 

















To Memory Operation C (column): 

o o c 
9 l S l o 2 
o|g|o 8 |£ 
o o c | c | l o 
vc ?|.E|[8|5|v9!?|9?|8.65 
From Memory s so|sg|so = SG 9 G 9 © 5 
Operation R (row):| 9| nlla |a | © 0,9 n | Se | 2a 
load # # # S S # # # # S S 
store M 8 4 M S M 8 M # M S 
atomic # 9 4 M S # # # # M S 
bload S S S S S S S S S S S 
bstore M S M M S M S M S M S 
load nc e + # 4$ S S # si # # s S 
store nc e S 8 4 S S # # Ww # M S 
load nc ne # # # S s # s wd # Sg S 
store nc ne S 9"? " S S M # M # M S 
bload nc S S S S S S S S S S S 
bstore nc S S S S S M S M S M S 











1. This table assumes that both noncacheable operations access the same device. 


2. When the store and subsequent load access the same location, no intervening MEMBAR is required. 


Hardware Primitives for Mutual Exclusion 


In addition to providing memory-ordering primitives that allow programmers to 
construct mutual-exclusion mechanisms in software, the UltraSPARC Architecture 
provides three hardware primitives for mutual exclusion: 


m Compare and Swap (CASA and CASXA) 
m Load Store Unsigned Byte (LDSTUB and LDSTUBA) 
m Swap (SWAP and SWAPA) 


Each of these instructions has the semantics of both a load and a store in all three 
memory models. They are all atomic, in the sense that no other store to the same 
location can be performed between the load and store elements of the instruction. 
All of the hardware mutual-exclusion operations conform to the TSO memory model 
and may require barrier instructions to ensure proper data visibility. 


Atomic load-store instructions can be used only in the cacheable domains (not in 
noncacheable I/O addresses). An attempt to use an atomic load-store instruction to 
access a noncacheable page results in a data access exception exception. 
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The atomic load-store alternate instructions can use a limited set of the ASIs. See the 
specific instruction descriptions for a list of the valid ASIs. An attempt to execute an 
atomic load-store alternate instruction with an invalid ASI results in a 

data access exception exception. 


9.5.4.4  Compare-and-5wap (CASA, CASXA) 


Compare-and-swap is an atomic operation that compares a value in a virtual 
processor register to a value in memory and, if and only if they are equal, swaps the 
value in memory with the value in a second virtual processor register. Both 32-bit 
(CASA) and 64-bit (CASXA) operations are provided. The compare-and-swap 
operation is atomic in the sense that once it begins, no other virtual processor can 
access the memory location specified until the compare has completed and the swap 
(if any) has also completed and is potentially visible to all other virtual processors in 
the system. 


Compare-and-swap is substantially more powerful than the other hardware 
synchronization primitives. It has an infinite consensus number; that is, it can 
resolve, in a wait-free fashion, an infinite number of contending processes. Because 
of this property, compare-and-swap can be used to construct wait-free algorithms 
that do not require the use of locks. For examples, see Programming with the Memory 
Models, contained in the separate volume UltraSPARC Architecture Application Notes. 


9.5.4. Swap (SWAP) 


SWAP atomically exchanges the lower 32 bits in a virtual processor register with a 
word in memory. SWAP has a consensus number of two; that is, it cannot resolve 
more than two contending processes in a wait-free fashion. 


9.5.4.3 Load Store Unsigned Byte (LDSTUB) 


LDSTUB loads a byte value from memory to a register and writes the value FF;, into 
the addressed byte atomically. LDSTUB is the classic test-and-set instruction. Like 
SWAP, it has a consensus number of two and so cannot resolve more than two 
contending processes in a wait-free fashion. 


Memory Ordering and Synchronization 
The UltraSPARC Architecture provides some level of programmer control over 


memory ordering and synchronization through the MEMBAR and FLUSH 
instructions. 
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MEMBAR serves two distinct functions in SPARC V9. One variant of the MEMBAR, 
the ordering MEMBAR, provides a way for the programmer to control the order of 
loads and stores issued by a virtual processor. The other variant of MEMBAR, the 
sequencing MEMBAR, enables the programmer to explicitly control order and 
completion for memory operations. Sequencing MEMBARs are needed only when a 
program requires that the effect of an operation becomes globally visible rather than 
simply being scheduled.! Because both forms are bit-encoded into the instruction, a 
single MEMBAR can function both as an ordering MEMBAR and as a sequencing 
MEMBAR. 


The SPARC V9 instruction set architecture does not guarantee consistency between 
instruction and data spaces. A problem arises when instruction space is dynamically 
modified by a program writing to memory locations containing instructions (Self- 
Modifying Code). Examples are Lisp, debuggers, and dynamic linking. The FLUSH 
instruction synchronizes instruction and data memory after instruction space has 
been modified. 


9.5.5.1 Ordering MEMBAR Instructions 


Ordering MEMBAR instructions induce an ordering in the instruction stream of a 
single virtual processor. Sets of loads and stores that appear before the MEMBAR in 
program order are ordered with respect to sets of loads and stores that follow the 
MEMBAR in program order. Atomic operations (LDSTUB(A), SWAP(A), CASA, and 
CASXA) are ordered by MEMBAR as if they were both a load and a store, since they 
share the semantics of both. An STBAR instruction, with semantics that are a subset 
of MEMBAR, is provided for SPARC V8 compatibility. MEMBAR and STBAR 
operate on all pending memory operations in the reorder buffer, independently of 
their address or ASI, ordering them with respect to all future memory operations. 
This ordering applies only to memory-reference instructions issued by the virtual 
processor issuing the MEMBAR. Memory-reference instructions issued by other 
virtual processors are unaffected. 


The ordering relationships are bit-encoded as shown in TABLE 9-3. For example, 
MEMBAR 0146, written as “membar #LoadLoad” in assembly language, requires 
that all load operations appearing before the MEMBAR in program order complete 
before any of the load operations following the MEMBAR in program order 
complete. Store operations are unconstrained in this case. MEMBAR 0846 
(#StoreStore) is equivalent to the STBAR instruction; it requires that the values 
stored by store instructions appearing in program order prior to the STBAR 
instruction be visible to other virtual processors before issuing any store operations 
that appear in program order following the STBAR. 


L-Sequencing MEMBARs are needed for some input/output operations, forcing stores into specialized stable 
storage, context switching, and occasional other system functions. Using a sequencing MEMBAR when one is 
not needed may cause a degradation of performance. See Programming with the Memory Models, contained in 
the separate volume UltraSPARC Architecture Application Notes, for examples of the use of sequencing 
MEMBARs. 


416 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


In TABLE 9-3 these ordering relationships are specified by the “<m” symbol, which 
signifies memory order. See Appendix D, Formal Specification of the Memory Models, 
for a formal description of the <m relationship. 


TABLE9-3 Ordering Relationships Selected by Mask 











Ordering Relation, ^ Assembly Language Effective Behavior Mask — nmask 

Earlier «m Later Constant Mnemonic in TSO model Value Bit # 

Load «m Load LoadLoad nop 011g 0 

Store «m Load StoreLoad #StoreLoad 0246 1 

Load <m Store LoadStore nop 0446 2 

Store «m Store StoreStore nop 0816 3 
Implementation | An UltraSPARC Architecture 2005 implementation that only 


Note implements the TSO memory model may implement 
MEMBAR #LoadLoad, MEMBAR #LoadStore, and 
MEMBAR #StoreStore as nops and MEMBAR #Storeload 
as a MEMBAR #Sync. 


9.5.5.2 Sequencing MEMBAR Instructions 


A sequencing MEMBAR exerts explicit control over the completion of operations. 
The three sequencing MEMBAR options each have a different degree of control and 
a different application. 


m Lookaside Barrier — Ensures that loads following this MEMBAR are from 
memory and not from a lookaside into a write buffer. Lookaside Barrier requires 
that pending stores issued prior to the MEMBAR be completed before any load 
from that address following the MEMBAR may be issued. A Lookaside Barrier 
MEMBAR may be needed to provide lock fairness and to support some plausible 
I/O location semantics. See the example in "Control and Status Registers" in 
Programming with the Memory Models, contained in the separate volume 
UltraSPARC Architecture Application Notes. 


m Memory Issue Barrier — Ensures that all memory operations appearing in 
program order before the sequencing MEMBAR complete before any new 
memory operation may be initiated. See the example in "I/O Registers with Side 
Effects" in Programming with the Memory Models, contained in the separate volume 
UltraSPARC Architecture Application Notes. 


m Synchronization Barrier — Ensures that all instructions (memory reference and 
others) preceding the MEMBAR complete and that the effects of any fault or error 
have become visible before any instruction following the MEMBAR in program 
order is initiated. A Synchronization Barrier MEMBAR fully synchronizes the 
virtual processor that issues it. 


TABLE 9-4 shows the encoding of these functions in the MEMBAR instruction. 
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TABLE 9-4 Sequencing Barrier Selected by Mask 





Sequencing Function Assembler Tag Mask Value cmask Bit # 
Lookaside Barrier Lookaside 1016 0 
Memory Issue Barrier MemIssue 2016 1 
Synchronization Barrier Sync 4016 2 


Implementation | In UltraSPARC Architecture 2005 implementations, 
Note | MEMBAR #Lookaside and MEMBAR #MemIssue are 
typically implemented as a MEMBAR #Sync. 





For more details, see the MEMBAR instruction on page 275 of Chapter 7, Instructions. 


9.5.5.3 Synchronizing Instruction and Data Memory 


The SPARC V9 memory models do not require that instruction and data memory 
images be consistent at all times. The instruction and data memory images may 
become inconsistent if a program writes into the instruction stream. As a result, 
whenever instructions are modified by a program in a context where the data (that 
is, the instructions) in the memory and the data cache hierarchy may be inconsistent 
with instructions in the instruction cache hierarchy, some special programmatic 
(software) action must be taken. 


The FLUSH instruction will ensure consistency between the in-flight instruction 
stream and the data references in the virtual processor executing FLUSH. The 
programmer must ensure that the modification sequence is robust under multiple 
updates and concurrent execution. Since, in general, loads and stores may be 
performed out of order, appropriate MEMBAR and FLUSH instructions must be 
interspersed as needed to control the order in which the instruction data are 
modified. 


The FLUSH instruction ensures that subsequent instruction fetches from the 
doubleword target of the FLUSH by the virtual processor executing the FLUSH 
appear to execute after any loads, stores, and atomic load-stores issued by the virtual 
processor to that address prior to the FLUSH. FLUSH acts as a barrier for instruction 
fetches in the virtual processor on which it executes and has the properties of a store 
with respect to MEMBAR operations. 
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IMPL. DEP. #122-V9: The latency between the execution of FLUSH on one virtual 
processor and the point at which the modified instructions have replaced outdated 
instructions in a multiprocessor is implementation dependent. 


Programming 
Note 





Because FLUSH is designed to act on a doubleword and 
because, on some implementations, FLUSH may trap to system 
software, it is recommended that system software provide a 
user-callable service routine for flushing arbitrarily sized regions 
of memory. On some implementations, this routine would issue 
a series of FLUSH instructions; on others, it might issue a single 
trap to system software that would then flush the entire region. 


On an UltraSPARC Architecture virtual processor: 


m A FLUSH instruction causes a synchronization with the virtual processor, which 


flushes the instruction pipeline in the virtual processor on which the FLUSH 
instruction is executed. 


m Coherency between instruction and data memories may or may not be 
maintained by hardware. If it is, an UltraSPARC Architecture implementation 
may ignore the address in the operands of a FLUSH instruction. 


Programming 
Note 





UItraSPARC Architecture virtual processors are not required to 
maintain coherency between instruction and data caches in 
hardware. Therefore, portable software must do the following: 


(1) must always assume that store instructions (except Block 
Store with Commit) do not coherently update instruction 
cache(s); 


(2) must, in every FLUSH instruction, supply the address of the 
instruction or instructions that were modified. 


For more details, see the FLUSH instruction on page 188 of Chapter 7, Instructions. 





9.6 


Nonfaulting Load 


A nonfaulting load behaves like a normal load, with the following exceptions: 


m À nonfaulting load from a location with side effects (TTE.e = 1) causes a 
data access exception exception. 


m A nonfaulting load from a page marked for nonfault access only (TTE.nfo = 1) is 
allowed; other types of accesses to such a page cause a dala access exception 


exception. 





m These loads are issued with ASI PRIMARY NO FAULT[ LITTLE] or 
ASI SECONDARY NO FAULT[ LITTLE]. A store with a NO FAULT ASI causes a 
data access exception exception. 
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Typically, optimizers use nonfaulting loads to move loads across conditional control 
structures that guard their use. This technique potentially increases the distance 
between a load of data and the first use of that data, in order to hide latency. The 
technique allows more flexibility in instruction scheduling and improves 
performance in certain algorithms by removing address checking from the critical 
code path. 


For example, when following a linked list, nonfaulting loads allow the null pointer 
to be accessed safely in a speculative, read-ahead fashion; the page at virtual address 
0; can safely be accessed with no penalty!. The TTE.nfo bit marks pages that are 
mapped for safe access by nonfaulting loads but that can still cause a trap by other, 
normal accesses. 


Thus, programmers can trap on “wild” pointer references—many programmers 
count on an exception being generated when accessing address 046 to debug 
software—while benefiting from the acceleration of nonfaulting access in debugged 
library routines. 





9.7 


Store Coalescing 


Cacheable stores may be coalesced with adjacent cacheable stores within an 8 byte 
boundary offset in the store buffer to improve store bandwidth. Similarly non-side- 
effect-noncacheable stores may be coalesced with adjacent non-side-effect 
noncacheable stores within an 8-byte boundary offset in the store buffer. 


In order to maintain strong ordering for I/O accesses, stores with side-effect 
attribute (e bit set) will not be combined with any other stores. 


Stores that are separated by an intervening MEMBAR #Sync will not be coalesced. 


"Other than the impact of occupying TLB entries. 
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CHAPTER 1 0 


Address Space Identifiers (ASIs) 





This appendix describes address space identifiers (ASIs) in the following sections: 


Address Space Identifiers and Address Spaces on page 421. 
ASI Values on page 421. 
ASI Assignments on page 422. 


a 
a 
a 
m Special Memory Access ASIs on page 436. 





10.1 


Address Space Identifiers and Address 
Spaces 


An UltraSPARC Architecture processor provides an address space identifier (ASI) 
with every address sent to memory. The ASI does the following: 


m Distinguishes between different address spaces 
m Provides an attribute that is unique to an address space 
m Maps internal control and diagnostics registers within a virtual processor 


The memory management unit uses a 64-bit virtual address and an 8-bit ASI to 
generate a memory, I/O, or internal register address. This physical address space 
can be accessed through virtual-to-physical address mapping or through the MMU 
bypass mode. 





10.2 


ASI Values 


The range of address space identifiers (ASIs) is 0016-FF16. That range is divided into 
restricted and unrestricted portions. ASIs in the range 8016-FF16 are unrestricted; 
they may be accessed by software running in any privilege mode. 


421 


ASIs in the range 00:6-7F36 are restricted; they may only be accessed by software 
running in a mode with sufficient privilege for the particular ASI. ASIs in the range 
0015-2F46 may only be accessed by software running in privileged or 
hyperprivileged mode and ASIs in the range 30:6-7F16 may only be accessed by 
software running in hyperprivileged mode. 


SPARC V9 | In SPARC V9, the range of ASIs was evenly divided into 
Compatibility | restricted (00,6-7F16) and unrestricted (80:6-FF16) halves. 
Note 


An attempt by nonprivileged software to access a restricted (privileged or 
hyperprivileged) ASI (0015-7F46) causes a privileged action trap. 


An attempt by privileged software to access a hyperprivileged ASI (30465-7F46) also 
causes a privileged action trap. 


An ASI can be categorized based on how it affects the MMU’s treatment of the 
accompanying address, into one of three categories: 


m A Virtual-Translating ASI (the most common type) causes the accompanying 
address to be treated as a virtual address (which is translated by the MMU into a 
physical address). 


m A Non-translating ASI is not translated by the MMU; instead the address is passed 
through unchanged. Nontranslating ASIs are typically used for accessing internal 
registers. 


m A Real-Translating ASI causes the accompanying address to be treated as a real 
address (which is translated by the MMU into a physical address). An access 
using a Real-Translating ASI can cause exception(s) only visible in 
hyperprivileged mode (such as a PA watchpoint exception). Real-Translating 
ASIs are typically used by privileged or hyperprivileged software for directly 
accessing memory using real or physical (as opposed to virtual) addresses. 


Implementation-dependent ASIs may or may not be translated by the MMU. See 
implementation-specific documentation for detailed information about 
implementation-dependent ASIs. 





10.3 ASI Assignments 


Every load or store address in an UltraSPARC Architecture processor has an 8-bit 
Address Space Identifier (ASI) appended to the virtual address (VA). The VA plus 
the ASI fully specify the address. 


For instruction fetches and for data loads, stores, and load-stores that do not use the 
load or store alternate instructions, the ASI is an implicit ASI generated by the 
virtual processor. 
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10.3.1 





If a load alternate, store alternate, or load-store alternate instruction is used, the 
value of the ASI (an "explicit ASI") can be specified in the ASI register or as an 
immediate value in the instruction. 


In practice, ASIs are not only used to differentiate address spaces but are also used 
for other functions like referencing registers in the MMU unit. 


Supported ASIs 


TABLE 10-1 lists architecturally-defined ASIs; some are in all UltraSPARC Architecture 
implementations and some are only present in some implementations. 


An ASI marked with a closed bullet (6) is required to be implemented on all 
UltraSPARC Architecture 2005 processors. 


An ASI marked with an open bullet (O) is defined by the UltraSPARC Architecture 
2005 but is not necessarily implemented in all UltraSPARC Architecture 2005 
processors; its implemention is optional. Across all implementations on which it is 
implemented, it appears to software to behave identically. 


Some ASIs may only be used with certain load or store instructions; see table 
footnotes for details. 


The word "decoded" in the Virtual Address column of TABLE 10-1 indicates that the 
the supplied virtual address is decoded by the virtual processor. 


The “TVP / non-T / TRP" column of the table indicates whether each ASI is a 
Virtual-Translating ASI(translates Virtual-to-Physical), non-Translating ASI, or- 
Translating (translates Real-to-Physical) ASI, respectively. 


ASIs marked "Reserved" are set aside for use in future revisions to the architecture 
and are not to be used by implemenations. ASIs marked "implementation 
dependent" may be used for implementation-specific purposes. 


Attempting to access an address space described as "Implementation dependent" in 
TABLE 10-1 produces implementation-dependent results. 

















TABLE 10-1 UltraSPARC Architecture ASIs (1 of 12) 
Virtual TVP/ |Shared 

ASI reqd(0) Access  |Address non-T/ /per 
Value opt! (O)|ASI Name (and Abbreviation) Type(s) (VA) TRP strand |Description 
000c- O = —212 Implementation dependent! 
0316 
0446 @ SI NUCLEUS (ASI N) RW24 (decoded) TVP — Implicit address space, 

nucleus context, TL > 0 
056 O = — Implementation dependent! 
0B46 





CHAPTER 10 * Address Space Identifiers (ASIs) 423 


TABLE 10-1 UltraSPARC Architecture ASIs (2 of 12) 


































































ASI 
Value opt! (O)|ASI Name (and Abbreviation) strand |Description 
OC16 € SI NUCLEUS LITTLE (ASI NI) RW^^ (decoded) TVP — Implicit address space, 
nucleus context, TL > 0, 
little-endian 
ODic- O = —212 Implementation dependent 
OF 16 
1016 @ A31 AS IF USER PRIMARY RW**® (decoded) Typ  — Primary address space, as if 
(ASI AIUP) user (nonprivileged) 
1146 € ASI_AS_IF_USER_SECONDARY RW**® (decoded) Typ  — Secondary address space, as 
(ASI_AIUS) if user (nonprivileged) 
126 O - —212 Implementation dependent 
1316 
1416 O ASI REAL RW77 (decoded) TRP  — Real address 
1546 O ASI REAL IOP RW?? (decoded) TRP — Real address, noncacheable, 
with side effect (deprecated) 
1616 O ASI BLOCK AS IF USER PRIMARY RW?l4(decoded) Typ  — Primary address space, 
(ASI BLK AIUP) block load /store, as if user 
(nonprivileged) 
1716 O ASI BLOCK AS IF USER SECONDAR RW2##18(decoded) Tvp  — Secondary address space, 
Y block load /store, as if user 
(ASI BLK AIUS) (nonprivileged) 
1846 @ ^31 AS IF USER PRIMARY LITTLE RW218 (decoded) TVP — Primary address space, as if 
(ASI_AIUPL) user (nonprivileged), little- 
endian 
1946 @ ASI_AS_IF_USER_SECONDARY_ RW7*I8 (decoded) Typ — Secondary address space, as 
LITTLE (ASI_AIUSL) if user (nonprivileged), little- 
endian 
14356 O - —212 Implementation dependent! 
1Bi¢ 
1C16 O ASI REAL LITTLE RW 77 (decoded) TRP  — Real address, little-endian 
(ASI REAL L) 
1Dig O ASI, REAL IO LITTLE RW 2° (decoded) TRP — Real address, noncacheable, 
(ASI REAL IO ID) with side effect, little-endian 
(deprecated) 
1E4g O ASI BLOCK AS IF USER PRIMARY RW?9l&(decoded) ‘Tvp  — Primary address space, 
LITTLE block load /store, as if user 
(ASI BLK AIUPI) (nonprivileged), little-endian 
1Fi¢ Q ASI, BLOCK AS IF. USER RW251418(decoded) TVP — Secondary address space, 
SECONDARY LITTLE block load/store, as if user 
(ASI BLK AIUS L) (nonprivileged), little-endian 
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TABLE 10-1 UltraSPARC Architecture ASIs (3 of 12) 


Virtual 






ASI Address non-T/ |/per 














Value opt! (O)|ASI Name (and Abbreviation) (VA) strand |Description 
2016 © ASI SCRATCHPAD RW^? (decoded; non-T per Privileged Scratchpad 
see below) strand registers; implementation 
dependent 
O 016 5 "Scratchpad Register 0! 
O 816 5 "Scratchpad Register 1! 
O 1016 » Scratchpad Register 2! 
O 1816 i "  Scratchpad Register 3! 
O 2016 s "  Scratchpad Register 4! 
O 2816 D "  Scratchpad Register 5! 
O 3016 n " Scratchpad Register 6! 
O 3816 i "  Scratchpad Register 7! 
2116 O ASI. MMU CONTEXTID RW?29 (decoded; non-T per MMU context registers 
see below) strand 
O " 816 3 "I/D MMU Primary 
Context ID register 
O g 1016 i "I/D MMU Secondary 
Context ID register 
2216 O ASI TWINX AS IF USER RZ7ZI (decoded) TVP — Primary address space, 128- 
PRIMARY bit atomic load twin 
(ASI TWINX AIUP) extended word, as if user 
(nonprivileged) 
236 © ASI TWINX AS IF USER R27! (decoded) TVP — Secondary address space, 
SECONDARY 128-bit atomic load twin 
(ASI TWINX, AIUS) extended word, as if user 
(nonprivileged) 
2446 O — = Implementation dependent! 
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TABLE 10-1 UltraSPARC Architecture ASIs (4 of 12) 






ASI non-T/ |/per 
Value opt! (O)|ASI Name (and Abbreviation) strand |Description 


2546 O ASI, QUEUE (see (decoded; non-T per 
below) see below) strand 

O Rw? 3C016 B " CPU Mondo Queue Head 
Pointer 

Oo RW2617 8682 " " CPU Mondo Queue Tail 
Pointer 

O Rw? 3D016 à " Device Mondo Queue Head 
Pointer 

O RW2647 3D816 i " Device Mondo Queue Tail 
Pointer 

O RW?26 3E046 Resumable Error Queue 
Head Pointer 

O RW2647 3E816 Resumable Error Queue Tail 
Pointer 

O RW?26 8F04c Nonresumable Error Queue 
Head Pointer 

O RW2647 3F816 Nonresumable Error Queue 





Tail Pointer 


2616 O ASI_TWINX_REAL (ASI_TWINX_R) R! (decoded) TRP  —  128-bit atomic twin 
ASI_QUAD_LDD_REAL?t extended-word load from 
real address 


2716 © ASI TWINX NUCLEUS R27! (decoded) TVP — Nucleus context, 128-bit 




















(ASI_TWINX_N) atomic load twin extended- 
word 
TE a E S EEE dependent 
2816- | O = =” EN — — Implementation dependent 
PAGUT UR Aer UE AU SL d. 
2A46 O ASI TWINX AS IF USER Rẹ” (decoded) TVP — Primary address space, 128- 
PRIMARY_LITTLE bit atomic load twin 
(ASI_TWINXAIUPL) extended-word, as if user 
(nonprivileged), little-endian 
2Bijg O ASI TWINX AS IF USER R27! (decoded) TVP — Secondary address space, 
SECONDARY LITTLE 128-bit atomic load twin 
(ASI TWINX AIUS L) extended-word, as if user 
(nonprivileged), little-endian 
CT OHGAR-— à -——————' "o ———— ÁQ—— eue oo |. 
2C16 O m —2 Implementation dependent 
CYTA ALL RAE OO 2) 1) à A LAE. pee Y). eee. |" 
2D16 O = —212 Z — — Implementation dependent 
2E16 © ASI TWINX REAL LITTLE R^ (decoded) TRP — 128-bit atomic twin- 


(ASI TWINX REAL L) extended-word load from 
ASI QUAD, LDD REAL LITTLEP' real address, little-endian 
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TABLE 10-1 UltraSPARC Architecture ASIs (5 of 12) 


Virtual 






ASI Address non-T/ /per 
Value jopt’l (O)|ASI Name (and Abbreviation) strand |Description 




















2Fi¢ © ASI TWINX NUCLEUS LITTLE R^ (decoded) TVP — Nucleus context, 128-bit 
(ASI TWINX NL) atomic load twin extended- 
word, little-endian 
3006 O — — 343 = — — Implementation dependent 
4016 
3D 46 O —313 E — — Implementation dependent 
3E16 e = Z3 Reserved 
3F6- O — —343 = — — Implementation dependent! 
4116 O ASI CMT SHARED (see (decoded; non-T shared CMT control/status (shared) 
below) see below) 
O R361 0016 " " Virtual Processor (strand) 
Available Register 
O R361 1016 " " Virtual Processor (strand) 
Enable Status Register 
O RW?$ 20% " " Virtual Processor (strand) 
Enable Register 
Oo RW!96 306 " "  XIR Steering Register 


Implementation dependent! 
(impl. dep. #1105) 


9 Rw36 5016 " " Virtual Processor (strand) 
Running Register, general 
access 

9 R56 5816 " " Virtual Processor (strand) 
Running Status Register 

OQ W3610 6016 " " Virtual Processor (strand) 


Running Register, general 
access. Write '1' to set bit 
o W3610 6816 " " Virtual Processor (strand) 


Running Register, general 
access. Write '1' to clear bit 



































4c O ES — 343 Implementation dependent! 
4516 O m —343 Implementation dependent! 
46 O m — 343 Implementation dependent! 
4916 O i —313 Implementation dependent! 
4Aic- O = _3,13 Implementation dependent! 
4B16 
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TABLE 10-1 UltraSPARC Architecture ASIs (6 of 12) 


Virtual TVP/ |Shared 
ASI Access  |Address non-T/ |/per 
Value opt! (O)|ASI Name (and Abbreviation) Type(s) (VA) TRP strand |Description 





















4C16 O Error Status and Enable Registers Implementation dependent 
4Dic- O = c — Implementation dependent 
4F16 O ASI_HYP_SCRATCHPAD Rw” (decoded; non-T per Hyperprivileged Scratchpad 
see below) strand registers; implementation 
dependent! 
O 016 Hyperprivileged Scratchpad 
Register 0! 
O 816 Hyperprivileged Scratchpad 
Register 1! 
O 1016 Hyperprivileged Scratchpad 
Register 2! 
O 1816 Hyperprivileged Scratchpad 
Register 3! 
O 2016 Hyperprivileged Scratchpad 
Register 4! 
O 2816 Hyperprivileged Scratchpad 
Register 5! 
O 3016 Hyperprivileged Scratchpad 
Register 6! 
O 3816 Hyperprivileged Scratchpad 
Register 7! 
5016 O ASI IMMU — (decoded; non-T per IMMU registers 
see below) strand 
Oo R$$H — (Quo nonT per IMMU tag target register 
strand 
O RW?$ 18% nonT per Instruction fault status 
strand register (SFSR) 
o RWE 3046 non-T per lI TLB tag access register 
strand 
5216 O ASI, MMU REAL RW?* (see below) non-T per MMU registers 
strand 
O 10816 MMU Real Range 
O 11016 MMU Real Range 
O 11846 à " MMU Real Range 
O 12046 MMU Real Range 
O 20816 MMU Physical Address 


Offset Registers 
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TABLE 10-1 UltraSPARC Architecture ASIs (7 of 12) 
Description 
MMU Physical Address 


Virtual TVP/ |Shared 
ASI  reqd(0) Access |Address non-T/ /per 
Value opt! (O)|ASI Name (and Abbreviation) Type(s) (VA) TRP strand 
O " 2 
Offset Registers 


O 21816 i "MMU Physical Address 
Offset Registers 


O 22016 " "MMU Physical Address 
Offset Registers 








— 943 Implementation dependent 
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TABLE 10-1 UltraSPARC Architecture ASIs (8 of 12) 





Virtual TVP/ |Shared 
Address non-T/ |/per 
(VA) TRP strand |Description 




















5446 O ASI MMU (see (decoded; non-T per (more) MMU registers 
below) see below) strand 
Oo W8610 O16 " " ITLB data in register 
O RW?$ 1046 " " Context Zero TSB 
Configuration register 0 
Oo RWS 186 " " Context Zero TSB 
Configuration register 1 
O RW36 2016 " " Context Zero TSB 
Configuration register 2 
Oo RW 286 " " Context Zero TSB 
Configuration register 3 
O RW?$ 3046 " " Context Nonzero TSB 
Configuration register 0 
Oo RW 38416 " " Context Nonzero TSB 
Configuration register 1 
O RW?$ 40% " " Context Nonzero TSB 
Configuration register 2 
o RW? 486 " " Context Nonzero TSB 
Configuration register 3 
O RW?$ 50% " "Instruction TSB Pointer 
register 0 
O RW?$ 58% " "Instruction TSB Pointer 
register 1 
O RW?$ 60% " "Instruction TSB Pointer 
register 2 
O RW?$ 68% " "Instruction TSB Pointer 
register 3 
O RW?$ 7016 " "  Data/Unified TSB Pointer 
register 0 
O RW?$ 78% " "  Data/Unified TSB Pointer 
register 1 
O RW?$ 80% " " Data/Unified TSB Pointer 
register 2 
Oo RW?5 88% " " Data/Unified TSB Pointer 
register 3 
5516 O ASI ITLB DATA ACCESS REG RW$9  016-3F816; non-T per. IMMU TLB data access 
80016- strand register 
7FFF8:6 
FFFF846 strand 
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TABLE 10-1 UltraSPARC Architecture ASIs (9 of 12) 






non-T/ |/per 
strand |Description 


















































5716 © ASI IMMU DEMAP 6, 016 non-T per IMMU TLB demap 
strand 
5816 O ASI DMMU /ASI, UMMU (see (decoded; non-T  — Data or Unified MMU 
below) see below) registers 
O R36 — O16 " per D/U TSB tag target register 
strand 
Oo RWS 18,6 per Data error status register 
strand (DSFSR) 
O R36 ^ 2046 " /core Data error address register 
(DSFAR) 
Oo RWS 3046 " /core D/U TLB tag access register 
Oo RW 3816 " per VA instruction, and PA/VA 
strand data watchpoint register 
oO Rw36 8016 " per I/D/U MMU partition ID 
strand register 
59. O — 38 — — — Reserved 
5C16 O ASI DTLB DATA IN REG W619 O16 non-T per D/U TLB data in register 
strand 
5D46 © ASI DTLB DATA ACCESS REG RW” 016-3F816; non-T per D/U TLB data access 
80016- strand register 
7FFF8 16 
5E16 O ASI DTLB TAG READ REG RIA 016- non-T per D/U TLB tag read register 
FFFF846 strand 
5Fi6 OQ ASI DMMU DEMAP W3610 O16 nonT per D/U TLB demap 
strand 
600 O = — 343 Implementation dependent! 
6c O — — 343 Implementation dependent! 
6316 © ASI CMT PER STRAND, (see (decoded; non-T per CMT control/status 
ASI_CMT_PER_CORET below) see below) strand (per strand) 
Oo RW3$6 0016 " " A Virtual Processor (strand) 
Interrupt ID 
Oo R3611 1046 a " A Virtual Processor (strand) ID 
646 O ER — 343 Implementation dependent! 
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TABLE 10-1 UltraSPARC Architecture ASIs (10 of 12) 






ASI 


Value opt! (O)|ASI Name (and Abbreviation) Description 















































681 e — 3,13 Reserved 
8016 @ ASI_PRIMARY (ASI_P) RW (decoded) TVP — Implicit primary address 
space 
8116 €  ^SI SECONDARY (ASI S) RW? (decoded) TVP — Secondary address space 
8216 € ASI PRIMARY NO FAULT (ASI PNF) R” (decoded) TVP — Primary address space, no 
fault 
8316 € ASI SECONDARY NO FAULT R* (decoded) TVP — Secondary address space, no 
(ASI SNF) fault 
8416 © = —16 Reserved 
8716 
8816 @ SI PRIMARY LITTLE (ASI PI) RW! (decoded) TVP — Implicit primary address 
space, little-endian 
8916 € SI SECONDARY LITTLE (ASI SL) RW? (decoded) TVP — Secondary address space, 
little-endian 
8A16 © ASI PRIMARY NO FAULT LITTLE R°™ (decoded) Typ | — Primary address space, no 
(ASI PNFL) fault, little-endian 
8B16 € ASI SECONDARY NO FAULT LITTLE R”! (decoded) TVP — Seondary address space, no 
(ASI_SNFL) fault, little-endian 
8C16- e - —16 Reserved 
C046 O ASI PST8 PRIMARY (ASI_PST8_P) WU (decoded) TVP — Primary address space, 8x8- 
bit partial store 
Clé OQ ASI PST8. SECONDARY Wwe101t (decoded) Typ — Secondary address space, 
(ASI PST8 S) 8x8-bit partial store 
C246 O ASI PST16 PRIMARY W014 (decoded) Typ  — Primary address space, 
(ASI PST16 P) 4x16-bit partial store 
C316 O ASI PST16 SECONDARY W*1U (decoded) Typ  — Secondary address space, 
(ASI PST16, S) 4x16-bit partial store 
C446 O ASI PST32 PRIMARY WS*1U (decoded) Typ  — Primary address space, 2x32- 
(ASI PST32 P) bit partial store 
C546 O ASI PST32 SECONDARY WE (decoded) Typ — Secondary address space, 
(ASI_PST32_S) 2x32-bit partial store 
C61- e = 1 Implementation dependent! 
C716 
C816 O ASI PST8 PRIMARY LITTLE wear (decoded) TVP — Primary address space, 8x8- 
(ASI_PST8_PL) bit partial store, little-endian 
C916 O ASI PST8. SECONDARY LITTLE WS*1U (decoded) Typ  — Secondary address space, 
(ASI PST8 SI) 8x8-bit partial store, little- 
endian 
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UltraSPARC Architecture ASIs (11 of 12) 





Virtual 


Address non-T/ |/per 
strand |Description 




























































































ASI PST16 PRIMARY LIT (decoded) TVP — Primary address space, 4x16- 
(ASI PST16 PL) bit partial store, little-endian 
CBig  Q ASI PST16 SECONDARY, LITTLE W9!944 (decoded) Typ ^ — Secondary address space, 
(ASI PST16 SL) 4x16-bit partial store, little- 
endian 
CCi6 O ASI PST32 PRIMARY LITTLE WAV (decoded) TVP — Primary address space, 
(ASI PST32 PL) 2x32-bit partial store, little- 
endian 
CDig © ASI PST32 SECONDARY LITTLE WS$1014 (decoded) TVP — Second address space, 2x32- 
(ASI_PST32_SL) bit partial store, little-endian 
CEg- e ran —45 Implementation dependent! 
CF46 
D06 © ASI FL8 PRIMARY (ASI_FL8_P) RW*! (decoded) Typ  — Primary address space, one 
8-bit floating-point load / 
store 
Dig O ASI FL8 SECONDARY (ASI_FL8_S) RW (decoded)  Typ — Second address space, one 8- 
bit floating-point load /store 
D246 O ASI FL16 PRIMARY (ASI FL16 P) RW®!4 (decoded) Typ  — Primary address space, one 
16-bit floating-point load / 
store 
D346 Q ASI_FL16_SECONDARY RW®!4 (decoded) TVP — Second address space, one 
(ASI FL16 S) 16-bit floating-point load/ 
store 
D46- e = —15 Implementation dependent! 
D716 
D846 O ASI FL8 PRIMARY LITTLE RW? (decoded) TVP — Primary address space, one 
(ASI_FL8_PL) 8-bit floating point load/ 
store, little-endian 
D916 O ASI_FL8_SECONDARY_LITTLE RWS!^ (decoded) Typ  — Second address space, one 8- 
(ASI_FL8_SL) bit floating point load/store, 
little-endian 
DA6 O ASI FL16 PRIMARY LITTLE RW9/^ (decoded) TVP — Primary address space, one 
(ASI_FL16_PL) 16-bit floating-point load / 
store, little-endian 
DB4ig  Q ASI FL16 SECONDARY LITTLE RW®!4 (decoded) typ  — Second address space, one 
(ASI FL16 SL) 16-bit floating point load/ 
store, little-endian 
DCs e = —15 Implementation dependent! 
-DFig 
E0- e = —15 Reserved 
Elie 
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TABLE 10-1 UltraSPARC Architecture ASIs (12 of 12) 










































ASI 
Value opt! (O)|ASI Name (and Abbreviation) strand |Description 
E246 O ASI TWINX PRIMARY (decoded) Typ . — Primary address space, 128- 
(ASI, TWINX P) bit atomic load twin 
extended word 
E346 O ASI TWINX SECONDARY R (decoded) Typ | — Secondary address space, 
(ASI, TWINX S) 128-bit atomic load twin 
extended-word 
E4 e = ae) Implementation dependent! 
E946 
EAjg O ASI TWINX PRIMARY LITTLE RP (decoded) Typ  — Primary address space, 128- 
(ASI_TWINX_PL) bit atomic load twin 
extended word, little endian 
EB:6 O ASI_TWINX_SECONDARY_LITTLE RP (decoded) typ | — Secondary address space, 
(ASI TWINX SL) 128-bit atomic load twin 
extended word, little endian 
EC- O = 15 Implementation dependent! 
EF 16 
F016 O ASI BLOCK PRIMARY RW? (decoded) Typ  — Primary address space, 8x8- 
(ASI BLK P) byte block load /store 
Flig O ASI BLOCK SECONDARY RW®4 (decoded) Typ | — Secondary address space, 
(ASI BLK S) 8x8- byte block load /store 
FAc e = —15 Implementation dependent! 
F516 
F616- e = = Implementation dependent! 
F716 
F816 O ASI BLOCK PRIMARY LITTLE RW®4 (decoded) ‘Typ  — Primary address space, 8x8- 
(ASI_BLK_PL) byte block load/store, little 
endian 
F946 O ASI_BLOCK_SECONDARY_LITTLE RWI! (decoded) Typ  — Secondary address space, 
(ASI_BLK_SL) 8x8- byte block load/store, 
little endian 
FAï6- e — mE Implementation dependent! 
FDig 
FE e = = Implementation dependent! 
FFi6 
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This ASI name has been changed, for consistency; although use of this name is 
deprecated and software should use the new name, the old name is listed here for 
compatibility. 

This ASI was named ASI, DEVICE ID*SERIAL ID in older documents. 


Implementation dependent ASI (impl. dep. #29); available for use by implementors. 
Software that references this ASI may not be portable. 


An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to this ASI in nonprivileged mode causes a privileged action exception. 


An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to this ASI in nonprivileged mode or privileged mode causes a 
privileged action exception. 


May be used with all load alternate, store alternate, atomic alternate and prefetch 
alternate instructions (CASA, CASXA, LDSTUBA, LDTWA, LDDFA, LDFA, LDSBA, 
LDSHA, LDSWA, LDUBA, LDUHA, LDUWA, LDXA, PREFETCHA, STBA, STTWA, 
STDFA, STFA, STHA, STWA, STXA, SWAPA). 


May be used with all of the following load alternate and store alternate instructions: 
LDTWA, LDDFA, LDFA, LDSBA, LDSHA, LDSWA, LDUBA, LDUHA, LDUWA, LDXA, 
STBA, STTWA, STDFA, STFA, STHA, STWA, STXA. Use with an atomic alternate or 
prefetch alternate instruction (CASA, CASXA, LDSTUBA, SWAPA or PREFETCHA) 
causes a data access exception exception. 


May only be used in a LDXA or STXA instruction for RW ASIs, LDXA for read-only ASIs 
and STXA for write-only ASIs. Use of LDXA for write-only ASIs, STXA for read-only 
ASIs, or any other load alternate, store alternate, atomic alternate or prefetch alternate 
instruction causes a data access exception exception. 


May only be used in an LDTXA instruction. Use of this ASI in any other load alternate, 
store alternate, atomic alternate or prefetch alternate instruction causes a 
data access exception exception. 


May only be used in a LDDFA or STDFA instruction for RW ASIs, LDDFA for read-only 
ASIs and STDFA for write-only ASIs. Use of LDDFA for write-only ASIs, STDFA for 
read-only ASIs, or any other load alternate, store alternate, atomic alternate or prefetch 
alternate instruction causes a data access exception exception. 


May be used with all of the following load and prefetch alternate instructions: LDTWA, 
LDDFA, LDFA, LDSBA, LDSHA, LDSWA, LDUBA, LDUHA, LDUWA, LDXA, 
PREFETCHA. Use with an atomic alternate or store alternate instruction causes a 

data access exception exception. 

Write(store)-only ASI; an attempted load alternate, atomic alternate, or prefetch alternate 
instruction to this ASI causes a data access exception exception. 

Read(load)-only ASI; an attempted store alternate or atomic alternate instruction to this 
ASI causes a data access exception exception. 


An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to this ASI in privileged mode or hyperprivileged mode causes a 
data access exception exception. 


An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to this ASI in hyperprivileged mode causes a data access exception 
exception if this ASI is not implemented by the specific implementation. 
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14 An attempted access to this ASI may cause an exception (see Special Memory Access ASIs 
on page 436 for details). 


15 An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to this ASI in any mode causes a dala access exception exception if this ASI 
is not implemented by the model dependent implementation. 

16 An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to a reserved ASI in any mode causes a data access exception exception. 

17 The Queue Tail Registers (ASI 2546) are read-only by privileged software and read-write 
by hyperprivileged software. An attempted write to the Queue Tail Registers by 
privileged software causes a data access exception exception 

18 An access to a privileged page (TTE.p = 1) using an ASI, *AS IF USER* ASI causes a 
data access exception exception. 





10.4 | Special Memory Access ASIs 


This section describes special memory access ASIs that are not described in other 
sections. 


10.4.1 ASIs 1046, 1116, 1616, 1716 and 1816 
(AST *AS IF USER *) 


These ASI are intended to be used in accesses from privileged and hyperprivileged 
mode, but are processed as if they were issued from nonprivileged mode. Therefore, 
they are subject to privilege-related exceptions. They are distinguished from each 
other by the context from which the access is made, as described in TABLE 10-2. 


When one of these ASIs is specified in a load alternate or store alternate instruction, 
the virtual processor behaves as follows: 

m In nonprivileged mode, a privileged action exception occurs 

m In any other privilege mode: 


a If U/DMMU TTE.p = 1, a data access exception (privilege violation) 
exception occurs 

» Otherwise, the access occurs and its endianness is determined by the current 
privileged mode and the U/DMMU TTE.ie bit. In hyperprivileged mode, the 
access is always made in big-endian byte order. In privileged mode, if U/ 
DMMU TTE.ie = 0, the access is big-endian; otherwise, it is little-endian. 
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10.4.2 





TABLE 10-2 Privileged ASI *AS IF USER * ASIs 








Addressing 
ASI Names (Context) Endianness of Access 
1016 ASI AS IF USER PRIMARY (ASI, AIUP) Virtual m 
(Primary) In nonprivileged or 


privileged mode: 


Big-endian when 
1h ASI AS IF USER SECONDARY (ASI, AIUS) Virtual | |U/DMMU 


(Secondary) |TTE.ie - 0; 
little-endian when 











1616 ASI BLOCK AS IF USER PRIMARY Virtual |U/DMMU 
(ASI BLK, AIUP) (Primary) |TTE.ie=1 
In nyperprivileged 
mode: always big- 
1716 ASI BLOCK AS IF USER SECONDARY Virtual — indian: 
(ASI BLK AIUS) (Secondary) 





ASIs 1816, 1916, 1E6, and 1F36 
(ASI *AS IF USER * LITTLE) 


These ASIs are little-endian versions of ASIs 1046, 1146, 1616, and 1716 

(ASI AS IF USER *), described in section 10.4.1. Each operates identically to the 
corresponding non-little-endian ASI, except that if an access occurs its endianness is 
the opposite of that for the corresponding non-little-endian ASI. 





These ASI are intended to be used in accesses from privileged and hyperprivileged 
mode, but are processed as if they were issued from nonprivileged mode. Therefore, 
they are subject to privilege-related exceptions. They are distinguished from each 
other by the context from which the access is made, as described in TABLE 10-3. 


When one of these ASIs is specified in a load alternate or store alternate instruction, 
the virtual processor behaves as follows: 

m In nonprivileged mode, a privileged action exception occurs 

m In any other privilege mode: 


a If U/DMMU TTE.p = 1, a data access exception (privilege violation) 
exception occurs 


a Otherwise, the access occurs and its endianness is determined by the U/ 
DMMU TTE.ie bit. If U/DMMU TTE.ie = 0, the access is little-endian; 
otherwise, it is big-endian. 
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TABLE 10-3 Privileged ASI *AS IF USER * LITTLE ASIs 
































Addressing Endianness of 
ASI Names (Context) Access 
1815 ASI AS IF USER PRIMARY LITTLE Virtual i | 
(ASI_AIUPL) (Primary) |Little-endian 
when U/ 
19416 ASI AS IF USER SECONDARY, LITTLE Virtual DMMU 
(ASI AIUSL) (Secondary) |TTE.ie = 0; 
1Ej6 ASI BLOCK AS IF USER PRIMARY LITTLE Virtual  |big-endian 
(ASI BLK AIUP) (Primary) [When U/ 
DMMU 
1F36 ASI BLOCK AS IF USER SECONDARY, LITTLE Virtual TTE.ie=1 
(ASI BLK AIUSL) (Secondary) 





10.4.3 ASI 1446 (ASI REAL) 


When ASI REAL is specified in any load alternate, store alternate or prefetch 
alternate instruction, the virtual processor behaves as follows: 





m In nonprivileged mode, a privileged action exception occurs 
m In any other privilege mode: 
= VA is passed through to RA 
a During the address translation, context values are disregarded. 
a The endianness of the access is dertermined by the U/DMMU TTE.ie bit; if U/ 
DMMU TTE.ie = 0, the access is big-endian, otherwise it is little-endian. 


Even if data address translation is disabled, an access with this ASI is still a 
cacheable access. 


10.44 ASI 1546 (ASI REAL IO) 


Accesses with ASI REAL IO bypass the external cache and behave as if the side 
effect bit (TTE.e bit) is set. When this ASI is specified in any load alternate or store 
alternate instruction, the virtual processor behaves as follows: 





m In nonprivileged mode, a privileged action exception occurs 


m If used with a CASA, CASXA, LDSTUBA, SWAPA, or PREFETCHA instruction, a 
data access exception exception occurs 


m Used with any other load alternate or store alternate instuction, in privileged 
mode or hyperprivileged mode: 


= VA is passed through to RA 


a During the address translation, context values are disregarded. 
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a The endianness of the access is dertermined by the U/DMMU TTE.ie bit; if U/ 
DMMU TTE.ie = 0, the access is big-endian, otherwise it is little-endian. 


10.4.5 ASI 1Ci¢ (ASI REAL LITTLE) 











ASI REAL LITTLE is a little-endian version of ASI 1446 (ASI, REAL). It operates 
identically to ASI. REAL, except if an access occurs, its endianness the opposite of 
that for ASI. REAL. 














10.4.6 ASI 1Di¢ (ASI REAL IO LITTLE) 








ASI REAL IO, LITTLE is a little-endian version of ASI 15416 (ASI REAL IO). It 
operates identically to ASI REAL IO, except if an access occurs, its endianness the 
opposite of that for ASI REAL IO. 

















10.4.7 = ASIs 2246, 2316, 2716, 2A16, 2B46, 2F16 
(Privileged Load Integer Twin Extended 
Word) 


ASIs 2216, 2316 2716, 2A46, 2B16 and 2F36 exist for use with the (nonportable) 
LDTXA instruction as atomic Load Integer Twin Extended Word operations (see Load 
Integer Twin Extended Word from Alternate Space on page 270). These ASIs are 
distinguished by the context from which the access is made and the endianness of 
the access, as described in TABLE 10-4. 
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10.4.8 


TABLE 10-4 Privileged Load Integer Twin Extended Word / Block Store Init ASIs 
































Addressing Endianness of 
ASI Names (Context) Access 
224g ASI TWINX AS IF USER PRIMARY Virtual Big-endian 
(ASI TWINX AIUP) (Primary) when U/ 
2316 ASI_TWINX_AS_IF_USER_SECONDARY Virtual ue 0: 
ASI TWINX AI d uad 
(ASI —AIUS) (Secondary) little-endian 
2756 ASI_TWINX_NUCLEUS (ASI_TWINX_N) Virtualł when U/ 
(Nucleus) DMMU 
TTEie-1 
2A4g ASI TWINX AS IF USER PRIMARY LITTLE Virtual Little-endian 
(ASI TWINX AIUP. L) (Primary) when U/ 
2B36 ASI_TWINX_AS_IF_USER_SECONDARY_ Virtual mcs 0: 
LITTLE (ASI TWINX AIUS L) (Secondary) ,. ^", ” 
big-endian 
2Fig ASI TWINX NUCLEUS, LITTLE Virtual] ^ when U/ 
(ASI TWINX NL) (Nucleus) DMMU 
TTE.ie = 1 





t In hyperprivileged mode, this ASI uses Physical addressing 


When these ASIs are used with LDTXA, a mem adaress not aligned exception is 
generated if the operand address is not 16-byte aligned. 


If these ASIs are used with any other Load Alternate, Store Alternate, Atomic Load- 
Store Alternate, or PREFETCHA instruction, a data access exception exception is 
always generated and mem address not aligned is not generated. 


Compatibility | These ASIs replaced ASIs 2416 and 2C46 used in earlier 
Note | UltraSPARC implementations; see the detailed Compatibility Note 
on page 447 for details. 


ASIs 2616 and 2E46 (Privileged Load Integer Twin 
Extended Word, Real Addressing) 


ASIs 2646 and 2E, exist for use with the LDTXA instruction as atomic Load Integer 
Twin Extended Word operations using Real addressing (see Load Integer Twin 
Extended Word from Alternate Space on page 270). These two ASls are distinguished by 
the endianness of the access, as described in TABLE 10-5. 
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10.4.9 


TABLE 10-5 Load Integer Twin Extended Word (Real) ASIs 








Addressing 
ASI Name (Context) Endianness of Access 
2636 ASI TWINX REAL Real Big-endian when U/DMMU 


(ASI TWINX R) TTE.ie = 0; little-endian when U/ 


C)  pMMUTTEie-1 


2Ejg ASI TWINX REAL LITTLE Real Little-endian when U/DMMU 
(ASI_TWINX_REAL_L) TTE.ie = 0; big-endian when U/ 
C) DMMU TTE.ie -1 





When these ASIs are used with LDTXA, a mem adaress not aligned exception is 
generated if the operand address is not 16-byte aligned. 


If these ASIs are used with any other Load Alternate, Store Alternate, Atomic Load- 
Store Alternate, or PREFETCHA instruction, a data access exception exception is 
always generated and mem address not aligned is not generated. 


Compatibility | These ASIs replaced ASIs 3416 and 3C46 used in earlier 
Note | UltraSPARC implementations; see the Compatibility Note on 
page 447 for details. 


ASIs 3046, 3116, 3616 3845, 3916 3E16 


(ASI AS IF PRIV *) 

















These ASI are intended to be used in accesses from hyperprivileged mode, but are 
processed as if they were issued from privileged mode These ASIs are distinguished 
by the context from which the access is made and the endianness of the access, as 
described in TABLE 10-6. 


When one of these ASIs is specified in a load alternate or store alternate instruction, 
the virtual processor behaves as follows: 

m In nonprivileged or privileged mode, a privileged action exception occurs 

m In hyperprivileged mode: 


a The endianness of the access is determined by the U/DMMU TTE.ie bit; if U/ 
DMMU TTE.ie = 0, the access is big-endian; otherwise, it is little-endian. 
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TABLE 10-6 Hyperprivileged AS_IF_PRIV_* ASIs 























Addressing Endianness of 
ASI Names (Context) Access 
3056 ASI AS IF PRIV PRIMARY (ASI, AIPP) Virtual Big-endian 
(Primary) when U/ 
3136 ASI AS IF PRIV. SECONDARY Virtual E 0; 
(ASI_AIPS) (Secondary) little-endian 
3616 ASI_AS_IF_PRIV_NUCLEUS (ASI_AIPN) , when U/ 
Virtual 
(Nucleus) DM 
TTE.ie- 1 
3816 ASI AS IF PRIV PRIMARY LITTLE Virtual Little-endian 
(ASI AIPP L) (Primary) when U/ 
3936 ASI AS IF PRIV. SECONDARY LITTLE Virtual DMMU 
(ASI_AIPS_L) (Secondary) RE 
= u^ endian when 
3Eig ASI AS IF PRIV. NUCLEUS. LITTLE Virtual U/DMMU 
(ASI | AIPN L) (Nucleus) TTE.ie=1 








10.4.10 ASIs E246, E316, EA46, EB16 
(Nonprivileged Load Integer Twin Extended 
Word) 


ASIs E216, E316, EA46, and EB46 exist for use with the (nonportable) LDTXA 
instruction as atomic Load Integer Twin Extended Word operations (see Load Integer 
Twin Extended Word from Alternate Space on page 270). These ASIs are distinguished 
by the address space accessed (Primary or Secondary) and the endianness of the 
access, as described in TABLE 10-7. 
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10.4.11 


TABLE 10-7 Load Integer Twin Extended Word ASIs 














Addressing Endianness of 
ASI Names (Context) Access 
E246 ASI, TWINX PRIMARY (ASI, TWINX P) Virtual Big-endian 
(Primary) when U/ 
E346 ASI TWINX SECONDARY (ASI TWINX S) DI 
^ y = = TTE.ie = 0, 
Virtual little-endian 
(Secondary) when U/ 
DMMU 
TTE.ie = 1 
EA4g ASI TWINX PRIMARY LITTLE Virtual Little-endian 
(ASI TWINX PL) (Primary) when U/ 
DMMU 
EB4g ASI, TWINX SECONDARY, LITTLE TTEie = 0 
(ASI TWINX SL) . "d 
Virtual big-endian 
(Secondary) when U/ 
DMMU 
TTE.ie = 1 





When these ASIs are used with LDTXA, a mem adaress not aligned exception is 
generated if the operand address is not 16-byte aligned. 


If these ASIs are used with any other Load Alternate, Store Alternate, Atomic Load- 
Store Alternate, or PREFETCHA instruction, a data access exception exception is 
always generated and mem address not aligned is not generated. 


Block Load and Store ASIs 


ASIs 1616 176 1E;6, 1F56, F046, Flié, F816, and F916 exist for use with LDDFA and 
STDFA instructions as Block Load (LDBLOCKF) and Block Store (STBLOCKF) 
operations (see Block Load on page 247 and Block Store on page 335). 


When these ASIs are used with the LDDFA (STDFA) opcode for Block Load (Store), 
a mem_address_not_aligned exception is generated if the operand address is not 64- 
byte aligned. 


If a Block Load or Block Store ASI is used with any other Load Alternate, Store 
Alternate, Atomic Load-Store Alternate, or PREFETCHA instruction, a 

data access exception exception is always generated and 

mem address not aligned is not generated. 
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10.4.12 


10.4.13 


Partial Store ASIs 


ASIs C0465-C546 and C846-CD446 exist for use with the STDFA instruction as Partial 
Store (STPARTIALF) operations (see Store Partial Floating-Point on page 347). 


When these ASIs are used with STDFA for Partial Store, a 

mem address not aligned exception is generated if the operand address is not 8- 
byte aligned and an illegal instruction exception is generated if i = 1 in the 
instruction and the ASI register contains one of the Partial Store ASIs. 


If one of these ASIs is used with a Store Alternate instruction other than STDFA, a 
Load Alternate, Store Alternate, Atomic Load-Store Alternate, or PREFETCHA 
instruction, a dala access exception exception is generated and 

mem address not aligned, LDDF mem address not aligned, and 

illegal instruction (for i = 1) are not generated. 


ASIs C016-C54¢ and C846-CD45$ are only defined for use in Partial Store operations 
(see page 347). None of them should be used with LDDFA; however, if any of those 
ASIs is used with LDDFA, the resulting behavior is specified in the LDDFA 
instruction description on page 256. 


Short Floating-Point Load and Store ASIs 


ASIs D0:6-D3:6 and D8:6-DB:, exist for use with the LDDFA and STDFA 
instructions as Short Floating-point Load and Store operations (see Load Floating- 
Point Register on page 251 and Store Floating-Point on page 339). 


When ASI D246, D316, DA, or DB:, is used with LDDFA (STDFA) for a 16-bit Short 
Floating-point Load (Store), a mem_address_not_aligned exception is generated if 
the operand address is not halfword-aligned. 


If any of these ASIs are used with any other Load Alternate, Store Alternate, Atomic 
Load-Store Alternate, or PREFETCHA instruction, a data_access_exception 
exception is always generated and mem_address_not_aligned is not generated. 





10.5 


ASI-Accessible Registers 


In this section the Data Watchpoint registers, scratchpad registers, and CMT registers 
are described. 


A list of UltraSPARC Architecture 2005 ASIs is shown in TABLE 10-1 on page 423. 
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10.5.1 


Privileged Scratchpad Registers 
(ASI | SCRATCHPAD) 


An UltraSPARC Architecture virtual processor includes eight Scratchpad registers 
(64 bits each, read/write accessible) (impl.dep. #302-U4-Cs10). The use of the 
Scratchpad registers is completely defined by software. 


For conventional uses of Scratchpad registers, see "Scratchpad Register Usage" in 
Software Considerations, contained in the separate volume UltraSPARC Architecture 
Application Notes. 


The Scratchpad registers are intended to be used by performance-critical trap 
handler code. 


The addresses of the privileged scratchpad registers are defined in TABLE 10-8. 


TABLE 10-8 Scratchpad Registers 





Privileged Scratchpad 
Assembly Language ASI Name ASI# Virtual Address Register # 


0016 0 
0816 
1016 
1816 
2016 
2816 
3016 
3816 


ASI SCRATCHPAD 2016 


N Oo FO FP WN r2 


IMPL. DEP. #404-S10: The degree to which Scratchpad registers 4-7 are accessible to 
privileged software is implementation dependent. Each may be 

(1) fully accessible, 

(2) accessible, with access much slower than to scratchpad registers 0-3 (emulated 
by data access exceptiontrap to hyperprivileged software), or 

(3) inaccessible (cause a data access exception). 


V9 Compatibility | Privileged scratchpad registers are an UltraSPARC Architecture 
Note | extension to SPARC V9. 
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10.5.2 . Hyperprivileged Scratchpad Registers 
(ASI HYP. SCRATCHPAD) 


An UltraSPARC Architecture virtual processor includes eight hyperprivileged 
Scratchpad registers (64 bits each, read / write accessible). The use of the 
hyperprivileged Scratchpad registers is completely defined by software. 


The hyperprivileged Scratchpad registers are intended to be used in hyperprivileged 
trap handler code. 


The hyperprivileged Scratchpad registers are accessed with Load Alternate and 
Store Alternate instructions, using the ASIs and addresses listed in TABLE 10-9. 


IMPL. DEP. #407-S10: It is implementation dependent whether any of the 
hyperprivileged Scratchpad registers are aliased to the corresponding privileged 
Scratchpad register or is an independent register. 


TABLE 10-9  Hyperprivileged Scratchpad Registers 





Hyperprivileged 
Assembly Language ASI Name  ASI# Virtual Address Scratchpad Register # 


0016 0 
0816 1 
1016 2 
1816 3 
ASI_HYP_SCRATCHPAD 4F56 20 : 4 
2816 5 
3016 6 
3816 7 





V9 Compatibility | Hyperprivileged Scratchpad registers are an UltraSPARC 
Note | Architecture extension to SPARC V9. 


10.5.3 CMT Registers Accessed Through ASIs 


All chip-level multithreading (CMT) registers are accessed through ASIs. See 
Accessing CMT Registers on page 533, for descriptions of ASI registers used to control 
CMT functions. 


10.5.4 ASI Changes in the UltraSPARC Architecture 


The following Compatibility Notes summarize the UltraSPARC ASI changes in 
UltraSPARC Architecture. 
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Compatibility | The names of several ASIs used in earlier UltraSPARC 
Note | implementations have changed in UltraSPARC Architecture. Their 
functions have not changed; just their names have changed. 











ASI Previous UltraSPARC UltraSPARC Architecture 
1436 ASI PHYS USE EC ASI REAL 
1556 ASI PHYS BYPASS EC WITH EBIT ASI REAL IO 
1C16 ASI PHYS USE EC LITTLE ASI REAL LITTLE 
(ASI PHYS, USE, EC IL) 
1Dig ASI PHYS BYPASS EC WITH ASI REAL IO LITTLE 








EBIT LITTLE 
(ASI PHY, BYPASS EC WITH EBIT. I) 








Compatibility | The names and ASI assignments (but not functions) changed 
Note | between earlier UltraSPARC implementations and UltraSPARC 
Architecture, for the following ASIs: 


Previous UltraSPARC UltraSPARC Architecture 
ASI# Name ASI# Name 


2436 ASI NUCLEUS QUAD LDDP 2756 ASI TWINX NUCLEUS 
(ASI TWINX, N) 














2C4g ASI NUCLEUS QUAD LDD. 2Fyg ASI TWINX NUCLEUS 
LITTLEP LITTLE 
(ASI NUCLEUS, QUAD LDD LP) (ASI TWINX NL) 


3416 ASI QUAD LDD PHYS 2616 ASI TWINX REAL 

(ASI TWINX R) 

3C6 ASI QUAD LDD LITTLE? 2Ey6 ASI TWINX REAL LITTLE 
(ASI QUAD. LDD, LP) (ASI TWINX REAL L) 
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CHAPTER 1 1 


Performance Instrumentation 





This chapter describes the architecture for performance monitoring hardware on 
UltraSPARC Architecture processors. The architecture is based on the design of 
performance instrumentation counters in previous UltraSPARC Architecture 
processors, with an extension for the selective sampling of instructions. 





11.1 


11.1.1 


High-Level Requirements 


Usage Scenarios 


The performance monitoring hardware on UltraSPARC Architecture processors 
addresses the needs of various kinds of users. There are four scenarios envisioned: 


System-wide performance monitoring. In this scenario, someone skilled in system 
performance analysis (e.g, a Systems Engineer) is using analysis tools to evaluate 
the performance of the entire system. An example of such a tool is cpustat. The 
objective is to obtain performance data relating to the configuration and behavior 
of the system, e.g., the utilization of the memory system. 


Self-monitoring of performance by the operating system. In this scenario the OS is 

gathering performance data in order to tune the operation of the system. Some 

examples might be: 

a (a) determining whether the processors in the system should be running in 
single- or multi-stranded mode. 

= (b) determining the affinity of a process to a processor by examining that 
process's memory behavior. 


Performance analysis of an application by a developer. In this scenario a developer is 
trying to optimize the performance of a specific application, by altering the source 
code of the application or the compilation options. The developer needs to know 
the performance characteristics of the components of the application at a coarse 
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grain, and where these are problematic, to be able to determine fine-grained 
performance information. Using this information, the developer will alter the 
source or compilation parameters, re-run the application, and observe the new 
performance characteristics. This process is repeated until performance is 
acceptable, or no further improvements can be found. 


An example might be that a loop nest is measured to be not performing well. 
Upon closer inspection, the developer determines that the loop has poor cache 
behavior, and upon more detailed inspection finds a specific operation which 
repeatedly misses the cache. Reorganizing the code and/or data may improve the 
cache behavior. 


m Monitoring of an applications performance, e.g., by a Java Virtual Machine. In this 
scenario the application is not executing directly on the hardware, but its 
execution is being mediated by a piece of system software, which for the purposes 
of this document is called a Virtual Machine. This may be a Java VM, or a binary 
translation system running software compiled for another architecture, or for an 
earlier version of the UltraSPARC Architecture. One goal of the VM is to optimize 
the behavior of the application by monitoring its performance and dynamically 
reorganizing the execution of the application (e.g., by selective recompilation of 
the application). 


This scenario differs from the previous one principally in the time allowed to 
gather performance data. Because the data are being gathered during the 
execution of the program, the measurements must not adversely affect the 
performance of the application by more than, say, a few percent, and must yield 
insight into the performance of the application in a relatively short time 
(otherwise, optimization opportunities are deferred for too long). This implies an 
observation mechanism which is of very low overhead, so that many observations 
can be made in a short time. 


In contrast, a developer optimizing an application has the luxury of running or 
re-running the application for a considerable period of time (minutes or even 
hours) to gather data. However, the developer will also expect a level of precision 
and detail in the data which would overwhelm a virtual machine, so the accuracy 
of the data required by a virtual machine need not be as high as that supplied to 
the developer. 


Scenarios 1 and 2 are adequately dealt with by a suitable set of performance 
counters capable of counting a variety of performance-related events. Counters are 
ideal for these situations because they provide low-overhead statistics without any 
intrusion into the behavior of the system or disruption to the code being monitored. 
However, counters may not adequately address the latter two scenarios, in which 
detailed and timely information is required at the level of individual instructions. 
Therefore, UltraSPARC Architecture processors may also implement an instruction 
sampling mechanism. 
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11.1.3 


Metrics 


There are two classes of data reported by a performance instrumentation 
mechanism: 


m Architectural performance metrics. These are metrics related to the observable 
execution of code at the architectural level (UltraSPARC Architecture). Examples 
include: 


a The number of instructions executed 
» The number of floating point instructions executed 
a The number of conditional branch instructions executed 


m Implementation performance metrics. These describe the behavior of the 
microprocessor in terms of its implementation, and would not necessarily apply 
to another implementation of the architecture. 


In optimizing the performance of an application or system, attention will first be 
paid to the first class of metrics, and so these are more important. Only in 
performance-critical cases would the second class receive attention, since using these 
metrics requires a fairly extensive understanding of the specific implementation of 
the UltraSPARC Architecture. 


Accuracy Requirements 


Accuracy requirements for performance instrumentation vary depending on the 
scenario. The requirements are complicated by the possibly speculative nature of 
UltraSPARC Architecture processor implementations. For example, an 
implementation may include in its cache miss statistics the misses induced by 
speculative executions which were subsequently flushed, or provide two separate 
statistics, one for the misses induced by flushed instructions and one for misses 
induced by retired instructions. Although the latter would be desirable, the 
additional implementation complexity of associating events with specific 
instructions is significant, and so all events may be counted without distinction. The 
instruction sampling mechanism may distinguish between instructions that retired 
and those that were flushed, in which case sampling can be used to obtain statistical 
estimates of the frequencies of operations induced by mis-speculation. 


For critical performance measurements, architectural event counts must be accurate 
to a high degree (1 part in 10°). Which counters are considered performance-critical 
(and therefore accurate to 1 part in 10?) are specified in implementation-specific 
documentation. 


Implementation event counts must be accurate to 1 part in 10°, not including the 
speculative effects mentioned above. An upper bound on counter skew must be 
stated in implementation-specific documentation. 
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Programming | Increasing the time between counter reads will mitigate the 
Note | inaccurcies that could be introduced by counter skew (due to 
speculative effects). 





11:2 Performance Counters and Controls 


The performance instrumentation hardware provides performance instrumentation 
counters (PICs). The number and size of performance counters is implementation 
dependent, but each performance counter register contains at least one 32-bit 
counter. It is implementation dependent whether the performance counter registers 
are accessed as ASRs or are accessed through ASIs. 


There are one or more performance counter control registers (PCRs) associated with 
the counter registers. It is implementation dependent whether the PCRs are accessed 
as ASRs or are accessed through ASIs. 


Each counter in a counter register can count one kind of event at a time. The 
number of the kinds of events that can be counted is implementation dependent. 
For each performance counter register, the corresponding control register is used to 
select the event type being counted. A counter is incremented whenever an event of 
the matching type occurs. A counter may be incremented by an event caused by an 
instruction which is subsequently flushed (for example, due to mis-speculation). 
Counting of events may be controlled based on privilege mode or on the strand in 
which they occur. Masking may be provided to allow counting of subgroups of 
events (for example, various occurrences of different opcode groups). 


A field that indicates when a counter has overflowed must be present in either each 
performance instrumentation counter or in a PCR. 


Performance counters are usually provided on a per-strand basis. 


11.2.1 Counter Overflow 


Overflow of a counter is recorded in the overflow-indication field of register or a 
separate performance counter control register. 


Counter overflow indication is provided so that large counts can be maintained in 
software, beyond the range directly supported in hardware. The counters continue 
to count after an overflow, and software can utilize the overflow indicators to 
maintain additional high-order bits. 
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CHAPTER 1 2 


Traps 





A trap is a vectored transfer of control to software running in a privilege mode (see 
page 454) with (typically) greater privileges. A trap in nonprivileged mode can be 
delivered to privileged mode or hyperprivileged mode. A trap that occurs while 
executing in privileged mode can be delivered to privileged mode or 
hyperprivileged mode. A trap that occurs while executing in hyperprivileged mode 
can only be delivered to hyperprivileged mode. 


The actual transfer of control occurs through a trap table that contains the first eight 
instructions (32 instructions for clean window, fast instruction access MMU miss, 
fast data access MMU miss, fast data access protection, window spill, and 
window fill, traps) of each trap handler. The virtual base address of the trap table for 
traps to be delivered in privileged mode is specified in the Trap Base Address (TBA) 
register. The physical base address of the trap table for traps to be delivered in 
hyperprivileged mode is specified in the Hyperprivileged Trap Base Address 
(HTBA) register. The displacement within either table is determined by the trap type 
and the current trap level (TL). One-half of each table is reserved for hardware traps; 
the other half is reserved for software traps generated by Tcc instructions. 





A trap behaves like an unexpected procedure call. It causes the hardware to do the 
following: 


1. Save certain virtual processor state (such as program counters, CWP, ASI, CCR, 
PSTATE, and the trap type) on a hardware register stack. 


2. Enter privileged execution mode with a predefined PSTATE, or enter 
hyperprivileged mode with a predefined PSTATE and HPSTATE. 


3. Begin executing trap handler code in the trap vector. 


When the trap handler has finished, it uses either a DONE or RETRY instruction to 
return. 


A trap may be caused by a Tcc instruction, an instruction-induced exception, a reset, 
an asynchronous error, or an interrupt request not directly related to a particular 
instruction. The virtual processor must appear to behave as though, before executing 
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each instruction, it determines if there are any pending exceptions or interrupt 
requests. If there are pending exceptions or interrupt requests, the virtual processor 
selects the highest-priority exception or interrupt request and causes a trap. 


Thus, an exception is a condition that makes it impossible for the virtual processor to 
continue executing the current instruction stream without software intervention. A 
trap is the action taken by the virtual processor when it changes the instruction flow 
in response to the presence of an exception, interrupt, reset, or Tec instruction. 


V9 Compatibility | Exceptions referred to as "catastrophic error exceptions" in the 
Note | SPARC V9 specification do not exist in the UltraSPARC 
Architecture; they are handled using normal error-reporting 
exceptions. (impl. dep. #31-V8-Cs10) 


An interrupt is a request for service presented to a virtual processor by an external 
device. 


Traps are described in these sections: 


Virtual Processor Privilege Modes on page 454. 

Virtual Processor States, Normal Traps, and RED state Traps on page 456. 
Trap Categories on page 461. 

Trap Control on page 467. 

Trap-Table Entry Addresses on page 469. 

Trap Processing on page 486. 

Exception and Interrupt Descriptions on page 497. 

Register Window Traps on page 506. 





12.1 Virtual Processor Privilege Modes 


An UltraSPARC Architecture virtual processor is always operating in a discrete 
privilege mode. The privilege modes are listed below in order of increasing 
privilege: 

m Nonprivileged mode (also known as "user mode") 


m Privileged mode, in which supervisor (operating system) software primarily 
operates 


m Hyperprivileged mode, in which hypervisor software operates, serving as a layer 
between the supervisor software and the underlying virtual processor 
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The virtual processor's operating mode is determined by the state of two mode bits, 
as shown in TABLE 12-1. 


TABLE 12-1 Virtual Processor Privilege Modes 


HPSTATE.hpriv | PSTATE priv Virtual Processor Privilege Mode 





0 0 Nonprivileged 
0 1 Privileged 
1 = Hyperprivileged 





A trap is delivered to the virtual processor in either privileged mode or 
hyperprivileged mode; in which mode the trap is delivered depends on: 


m Its trap type 
m The trap level (TL) at the time the trap is taken 
m The privilege mode at the time the trap is taken 


Traps detected in nonprivileged and privileged mode can be delivered to the virtual 
processor in privileged mode or hyperprivileged mode. Traps detected in 
hyperprivileged mode are either delivered to the virtual processor in 
hyperprivileged mode or may be masked. If masked, they are held pending. 


TABLE 12-4 on page 475 indicates in which mode each trap is processed, based on the 
privilege mode at which it was detected. 


A trap delivered to privileged mode uses the privileged-mode trap vector, based 
upon the TBA register. See Trap-Table Entry Address to Privileged Mode on page 469 for 
details. A trap delivered to hyperprivileged mode uses the hyperprivileged mode 
trap vector address, based upon the HTBA register. See Trap-Table Entry Address to 
Hyperprivileged Mode on page 470 for details. 


The maximum trap level at which privileged software may execute is MAXPTL 
(which, on a virtual processor, is 2). Therefore, if TL 2 MAXPTL and a trap occurs that 
would normally be delivered in privileged mode, it is instead delivered in 
hyperprivileged mode; the trap table offset for watchdog reset (4046) is used, and 
the priority and trap type of the original exception is retained. This is referred to as 
a "guest watchdog" trap (so named because it uses watchdog reset's trap table 
offset). 


Notes | Execution in nonprivileged or privileged mode with 
TL » MAXPTL is an invalid condition that hyperprivileged 
software should never allow to occur. 


Execution in nonprivileged mode with TL » 0 is an invalid 
condition that privileged and hyperprivileged software should 
never allow to occur. 
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FIGURE 12-1 shows how a virtual processor transitions between privilege modes, 
excluding transitions that can occur due to direct software writes to PSTATE.priv or 
HPSTATE.hpriv. In this figure, indicates a "trap destined for privileged mode" 
and indicates a "trap destined for hyperprivileged mode". 


€ TL > MAXPTL (2), or 








€ TL > MAXPTL (2), or [RH 





€ TL « MAXPTL (2) 





Nonprivileged 





Privileged Hyperprivileged 





1 if ( (HTSTATE[TL].HPSTATE.hpriv = 0) ? if ((HTSTATE[TL].HPSTATE.hpriv = 0) 3 if ((HTSTATE!TLI.LHPSTATE.hpriv = 1) 
and (TSTATE[TL].PSTATE. priv = 0) ) and (TSTATE[TL].PSTATE. priv = 1) ) 


FIGURE 12-1 Virtual Processor Privilege Mode Transition Diagram 





12.2 Virtual Processor States, Normal Traps, 
and RED. state Traps 


An UItraSPARC Architecture virtual processor is always in one of three discrete 
states: 





m execute state, which is the normal execution state of the virtual processor 





m RED state (Reset, Error, and Debug state), which is a restricted execution state 
reserved for processing traps that occur when TL = MAXTL - 1, and for processing 
hardware- and software-initiated resets 


m error state, which is a transient state that is entered as a result of a non-reset 
trap, SIR when TL = MAXTL 


The values of TL and HPSTATE.red affect the generated trap vector address. TL also 
determines where (that is, into which element of the TSTATE and HTSTATE arrays) 
the states are saved.. 
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12.2.1 





Traps processed in execute. state are called normal traps. Traps processed in 
RED, state are called RED state traps. 








V9 Compatibility | RED state traps were called “special traps" in the SPARC V9 
Note | specification. The name was changed to clarify the terminology. 


FIGURE 12-2 shows the virtual processor state transition diagram. 


([N&r] or SIR) TL = MAXTL 












@ (Nri or SIR) @ 
TL = MAXTL-1, TL = MAXTL 










SIR @ 
TL < MAXTL, 


red < 1 as error_state 


or SIR) @ 
TL « MAXTL 







execute state 













TL < MAXTL-1 


Any State 
(Including Power Off) 


FIGURE 12-2 Virtual Processor State Diagram ("[NRr]" = “non-reset trap") 


RED state 





RED, state is an acronym for Reset, Error, and Debug state. The virtual processor 
enters RED. state under any one of the following conditions: 





m A non-reset trap is taken when TL = MAXTL -1. 

m A POR or WDR reset occurs. 

m An SIR reset occurs when TL < MAXTL. 

m An XIR reset occurs 

m System software sets HPSTATE.red = 1. For this condition, no other machine state 
or operation is modified as a side-effect of the write to HPSTATE; software must 
set the appropriate machine state. 





RED state serves two purposes: 


m During trap processing, it indicates that no more trap levels are available; that is, 
while executing in RED. state with TL = MAXTL, if another nested non-reset trap, 
SIR, or XIR is taken, the virtual processor will enter error state. RED state 
provides system software with a restricted execution environment. 
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m [t provides the execution environment for all reset processing. 











RED, state is indicated by HPSTATE.red. When this bit is set to 1, the virtual 
processor is in RED. state; when this bit is zero, the virtual processor is not in 
RED. state, independent of the value of TL. Executing a DONE or RETRY 


instruction in RED. state restores the stacked copy of the HPSTATE register, which 
zeroes the HPSTATE.red flag if it was zero in the stacked copy. System software can 
also directly write 1 or 0 to HPSTATE.red with a WRHPR instruction, which forces 
the virtual processor to enter or exit RED state, respectively. In this case, the 
WRHPR instruction should be placed in the delay slot of a jump instruction so that 
the PC can be changed in concert with the state change. 








When RED, state is entered due to a reset or a trap, the execution environment is 


altered in four ways: 


m Address translation is disabled in the MMU, for both instruction and data 


references. 
m Watchpoints are disabled. 


m The trap vector for the traps occurring in Ri 
Trap Table. 





ED state is based on the RI 





ED state 


m The virtual processor enters hyperprivileged mode (HPSTATE.hpriv < 1). 


Programming | Setting TL <— MAXTL with a WRHPR instruction does not also set 
Note | HPSTATE.red < 1, nor does it alter any other machine state. The 











values of HPSTATE.red and TL are independent. 


Setting HPSTATE.red with a WRHPR instruction causes the 
virtual processor to execute in RED, state. This results in the 
execution environment defined in RED state Execution 
Environment on page 458. However, it is different from a 

RED. state trap in the sense that there are no trap-related 
changes in the machine state (for example, TL does not change). 





The trap table organization for RED state traps is described in RED state Trap 


Table Organization on page 472. 


12.2.1.1 RED. state Execution Environment 





In RED. state, the virtual processor is forced to execute in a restricted environment 
by overriding the values of some virtual processor control and state registers. 


The values are overridden, not set, allowing them to be switched atomically. 





Some of the characteristics of RED state inc 





lude: 


m Memory accesses in RED state are by default noncacheable, so there must be 


noncacheable scratch memory somewhere 


in the system. 


m The D-cache watchpoints and DMMU/UMMU can be enabled by software in 
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RED. state, but any trap will disable them again. 





m The caches continue to snoop and maintain coherence in RED. state if DMA or 
other virtual processors are still issuing cacheable accesses. 


IMPL. DEP. #115-V9: A processor's behavior in RED state is implementation 
dependent. 





Programming | When RED, state is entered because of component failures, 
Note | trap handler software should attempt to recover from 
potentially fatal error conditions or to disable the failing 
components. When RED, state is entered after a reset, the 
software should create the environment necessary to restore the 
system to a running state. 








12.2.1.2 RED, state Entry Traps 





The following reset traps are processed in RED. state: 


m Power-on reset (POR) — POR causes the virtual processor to start execution at 
this trap table entry. 

m Watchdog reset (WDR) — While in error, state, the virtual processor 
automatically invokes a watchdog reset to enter RED. state (impl. dep. #254-U3- 
Cs10). 

m Externally initiated reset (XIR) — This trap is typically used as a nonmaskable 
interrupt for debugging purposes. When an XIR occurs, the reset trap is processed 
in RED, state. 

m Software-initiated reset (SIR) If TL < MAXTL when an SIR occurs, the reset trap 
is processed in RED, state;if TL = MAXTL when an SIR occurs, the virtual 
processor transitions directly to error state. 











Non-reset traps that occur when TL = MAXTL — 1 also set HPSTATE.red = 1; that is, 
any non-reset trap handler entered with TL = MAXTL runs in RED. state. 





Any non-reset trap that sets HPSTATE.red = 1 or that occurs when HPSTATE.red = 1 
branches to a special entry in the RED. state trap vector at RSTVADDR + A046. Reset 
traps are described in Reset Traps on page 466. 
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12.2.1.3 RED. state Software Considerations 


In effect, RED. state reserves one level of the trap stack for recovery and reset 
processing. Hyperprivileged software should be designed to require only MAXTL - 1 
trap levels for normal processing. That is, any trap that causes TL = MAXTL is an 
exceptional condition that should cause entry to RED state. 











Programming | To log the state of the virtual processor, RED. state-handler 
Note | software needs either a spare register or a preloaded pointer to a 
save area. To support recovery, the operating system might 
reserve one of the hyperprivileged scratchpad registers for use 
in RED, state. 





12.2.1.4 Usage of Trap Levels 


If MAXPTL = 2 and MAXTL = 5 in an UltraSPARC Architecture implementation, the trap 
levels might be used as shown in TABLE 12-2. 


TABLE 12-2 Typical Usage for Trap Levels 





Corresponding 


TL Execution Mode Usage 
0 Nonprivileged Normal execution 
1 Privileged System calls; interrupt handlers; instruction emulation 
2 Privileged Window spill/fill handler 
3 Hyperprivileged Real address TLB miss handler 
4 Hyperprivileged Reserved for error handling 
5 Hyperprivileged | RED. state handler 





12.2.2 error_state 


The virtual processor enters error_state when a trap occurs while the virtual 
processor is already at its maximum supported trap level — that is, it enters 
error_state when a trap occurs while TL = MAXTL. No other conditions cause 
entry into error_state on an UltraSPARC Architecture virtual processor. (impl. 
dep. #39-V8-Cs10) 


IMPL. DEP. #40-V8: Effects when error_state is entered are implementation- 
dependent, but it is recommended that as much processor state as possible be 
preserved upon entry to error_state. In addition, an UltraSPARC Architecture 
virtual processor may have other error_state entry traps that are implementation 
dependent. 


Upon entering error_state, a virtual processor automatically generates a 
watchdog_reset (WDR) (impl. dep. #254-U3-Cs10), which causes entry into 
RED_state. 
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12.3 


12:51 


12,3:2 


Trap Categories 


An exception, error, or interrupt request can cause any of the following trap types: 


Precise trap 
Deferred trap 
Disrupting trap 
Reset trap 


Precise Traps 


A precise trap is induced by a particular instruction and occurs before any program- 
visible state has been changed by the trap-inducing instructions. When a precise trap 
occurs, several conditions must be true: 


m The PC saved in TPC[TL] points to the instruction that induced the trap and the 
NPC saved in TNPC[TL] points to the instruction that was to be executed next. 


m All instructions issued before the one that induced the trap have completed 
execution. 


m Any instructions issued after the one that induced the trap remain unexecuted. 
Among the actions that trap handler software might take when processing a precise 
trap are: 


m Return to the instruction that caused the trap and reexecute it by executing a 
RETRY instruction (PC < old PC, NPC + old NPC). 


m Emulate the instruction that caused the trap and return to the succeeding 
instruction by executing a DONE instruction (PC < old NPC, 
NPC < old NPC + 4). 


m Terminate the program or process associated with the trap. 


Deferred Traps 


A deferred trap is also induced by a particular instruction, but unlike a precise trap, a 
deferred trap may occur after program-visible state has been changed. Such state 
may have been changed by the execution of either the trap-inducing instruction 
itself or by one or more other instructions. 


There are two classes of deferred traps: 


m Termination deferred traps — The instruction (usually a store) that caused the trap 
has passed the retirement point of execution (the TPC has been updated to point 
to an instruction beyond the one that caused the trap). The trap condition is an 
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error that prevents the instruction from completing and its results becoming 
globally visible. A termination deferred trap has high trap priority, second only to 
the priority of resets. 


Programming | Not enough state is saved for execution of the instruction stream 
Note | to resume with the instruction that caused the trap. Therefore, 
the trap handler must terminate the process containing the 
instruction that caused the trap. 


m Restartable deferred traps — The program-visible state has been changed by the 
trap-inducing instruction or by one or more other instructions after the trap- 
inducing instruction. 


SPARC V9 | A restartable deferred trap is the "deferred trap" defined in the 
Compatibility | SPARC V9 specification. 
Note 


The fundamental characteristic of a restartable deferred trap is that the state of the 
virtual processor on which the trap occurred may not be consistent with any precise 
point in the instruction sequence being executed on that virtual processor. When a 
restartable deferred trap occurs, TPC[TL] and TNPC[TL] contain a PC value and an 
NPC value, respectively, corresponding to a point in the instruction sequence being 
executed on the virtual processor. This PC may correspond to the trap-inducing 
instruction or it may correspond to an instruction following the trap-inducing 
instruction. With a restartable deferred trap, program-visible updates may be 
missing from instructions prior to the instruction to which TPC[TL] refers. The 
missing updates are limited to instructions in the range from (and including) the 
actual trap-inducing instruction up to (but not including) the instruction to which 
TPC[TL] refers. By definition, the instruction to which TPC[TL] refers has not yet 
executed, therefore it cannot have any updates, missing or otherwise. 


With a restartable deferred trap there must exist sufficient information to report the 
error that caused the deferred trap. If system software can recover from the error 
that caused the deferred trap, then there must be sufficient information to generate a 
consistent state within the processor so that execution can resume. Included in that 
information must be an indication of the mode (nonprivileged, privileged, or 
hyperprivileged) in which the trap-inducing instruction was issued. 


How the information necessary for repairing the state to make it consistent state is 
maintained and how the state is repaired to a consistent state are implementation 
dependent. It is also implementation dependent whether execution resumes at the 
point of the trap-inducing instruction or at an arbitrary point between the trap- 
inducing instruction and the instruction pointed to by the TPC[TL], inclusively. 


Associated with a particular restartable deferred trap implementation, the following 
must exist: 


m An instruction that causes a potentially outstanding restartable deferred trap 
exception to be taken as a trap 
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12.3.3 


m Instructions with sufficient privilege to access the state information needed by 
software to emulate the restartable deferred trap-inducing instruction and to 
resume execution of the trapped instruction stream. 


Programming | Resuming execution may require the emulation of instructions 
Note | that had not completed execution at the time of the restartable 
deferred trap, that is, those instructions in the deferred-trap 
queue. 


Software should resume execution with the instruction starting at the instruction to 
which TPC[TL] refers. Hardware should provide enough information for software to 
recreate virtual processor state and update it to the point just before execution of the 
instruction to which TPC[TL] refers. After software has updated virtual processor 

state up to that point, it can then resume execution by issuing a RETRY instruction. 


IMPL. DEP. #32-V8-Ms10: Whether any restartable deferred traps (and, possibly, 
associated deferred-trap queues) are present is implementation dependent. 


Among the actions software can take after a restartable deferred trap are these: 


m Emulate the instruction that caused the exception, emulate or cause to execute 
any other execution-deferred instructions that were in an associated restartable 
deferred trap state queue, and use RETRY to return control to the instruction at 
which the deferred trap was invoked. 


m Terminate the program or process associated with the restartable deferred trap. 


A deferred trap (of either of the two classes) is always delivered to the virtual 
processor in hyperprivileged mode. 


Disrupting Traps 


12.3.3.1 Disrupting versus Precise and Deferred Traps 


A disrupting trap is caused by a condition (for example, an interrupt) rather than 
directly by a particular instruction. This distinguishes it from precise and deferred 
traps. 


When a disrupting trap has been serviced, trap handler software normally arranges 
for program execution to resume where it left off. This distinguishes disrupting traps 
from reset traps, since a reset trap vectors to a unique reset address and execution of 
the program that was running when the reset occurred is generally not expected to 
resume. 


When a disrupting trap occurs, the following conditions are true: 


1. The PC saved in TPC[TL] points to an instruction in the disrupted program 
stream and the NPC value saved in TNPC[TL] points to the instruction that was 
to be executed after that one. 
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2. All instructions issued before the instruction indicated by TPC[TL] have 
retired. 


3. The instruction to which TPC[TL] refers and any instruction(s) that were 
issued after it remain unexecuted. 


A disrupting trap may be due to an interrupt request directly related to a 
previously-executed instruction; for example, when a previous instruction sets a bit 
in the SOFTINT register. 


12.3.3.2 Causes of Disrupting Traps 


A disrupting trap may occur due to either an interrupt request or an error not 
directly related to instruction processing. The source of an interrupt request may be 
either internal or external. An interrupt request can be induced by the assertion of a 
signal not directly related to any particular virtual processor or memory state, for 
example, the assertion of an "I/O done" signal. 


A condition that causes a disrupting trap persists until the condition is cleared. 


12.3.3.3 Conditioning of Disrupting Traps 


How disrupting traps are conditioned is affected by: 


m The privilege mode in effect when the trap is outstanding, just before the trap is 
actually taken (regardless of the privilege mode that was in effect when the 
exception was detected). 


m The privilege mode for which delivery of the trap is destined 


Outstanding in Nonprivileged or Privileged mode, destined for delivery in 
Privileged mode. An outstanding disrupting trap condition in either 
nonprivileged mode or privileged mode and destined for delivery to privileged 
mode is held pending while the Interrupt Enable (ie) field of PSTATE is zero 
(PSTATE.ie = 0). interrupt level n interrupts are further conditioned by the Processor 
Interrupt Level (PIL) register. An interrupt is held pending while either 

PSTATE.ie = 0 or the condition's interrupt level is less than or equal to the level 
specified in PIL. When delivery of this disrupting trap is enabled by PSTATE.ie = 1, 
it is delivered to the virtual processor in privileged mode if TL « MAXPTL (2, in 
UItraSPARC Architecture 2005 implementations) or in hyperprivileged mode if 

TL 2 MAXPTL. 


Outstanding in Hyperprivileged mode, destined for delivery in Privileged 
mode. An outstanding disrupting trap condition detected while in 
hyperprivileged mode and destined for delivery in privileged mode is held pending 
while in hyperprivileged mode (HPSTATE.priv = 1), regardless of the values of TL 
and PSTATE.ie. 
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Outstanding in Nonprivileged or Privileged mode, destined for delivery in 
Hyperprivileged mode. An outstanding disrupting trap condition detected while 
in either nonprivileged mode or privileged mode and destined for delivery in 
hyperprivileged mode is never masked; it is delivered immediately. 


Outstanding in Hyperprivileged mode, destined for delivery in 
Hyperprivileged mode. An outstanding disrupting trap condition detected in 
hyperprivileged mode and destined to be delivered in hyperprivileged mode is 
masked and held pending while PSTATE.ie = 0. 


The above is summarized in TABLE 12-3. 


TABLE 12-3 Conditioning of Disrupting Traps 


Disposition of Disrupting Traps, based on privilege 


Type of Disrupting Current Virtual Processor mode in which the trap is destined to be delivered 





Trap Condition Privilege Mode Privileged Hyperprivileged 
Nonprivileged or Held pending while — — 
Interrupt_level_n Privileged PSTATE.ie = 0 or 


interrupt level < PIL 


Hyperprivileged Held pending while — 
HPSTATE.hpriv = 1 


All other disrupting| Nonprivileged or Held pending while ^ Delivered 
traps Privileged PSTATE.ie = 0 immediately 
Hyperprivileged Held pending while Held pending while 


HPSTATE.hpriv 21  PSTATE.e = 0 


12.3.3.4 Trap Handler Actions for Disrupting Traps 
Among the actions that trap-handler software might take to process a disrupting 
trap are: 


m Use RETRY to return to the instruction at which the trap was invoked 
(PC — old PC, NPC < old NPC). 


m Terminate the program or process associated with the trap. 


12.3.3.5 Clearing Requirement for Disrupting Traps 
For each disrupting trap, a method must be provided for hyperprivileged software 


(or a service processor, if present) to detect and clear the pending disrupting trap 
without taking its corresponding hardware trap. 
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12.3.4 Reset Traps 


A reset trap occurs when hyperprivileged software or the implementation's hardware 
determines that the machine must be reset to a known state. Reset traps differ from 
disrupting traps in that: 

m They are not maskable. 


m Trap handler software for resets is generally not expected to resume execution of 
the program that was running when the reset trap occurred. After an SIR or XIR, 
execution of the interrupted program may resume. 


All reset traps are delivered to the virtual processor in hyperprivileged mode. 


IMPL. DEP. #37-V8: Some of a virtual processor's behavior during a reset trap is 
implementation dependent. See RED state Trap Processing on page 490 for details. 


The following reset traps are defined by the SPARC V9 architecture: 


m Power-on reset (POR) — Used for initialization purposes (for example, when 
power is applied or reapplied to the virtual processor). 


m Watchdog reset (WDR) — Initiated when the virtual processor enters 
error, state (impl. dep. #254-U3-Cs10). The WDR reset trap is taken instead of 
the trap request that caused entry to error state at TL = MAXTL. 
TSTATE[MAxTL], TPC[MAxTL], TNPC[MAXTL] and TT[MAXTL] observed after a WDR 
reset trap are those associated with the trap request that caused entry to 
error state. The value of TT[MAXTL] indicates the trap type of this trap. 
Machine state is consistent; however, software should not resume normal 
instruction processing at the address in TPC[TL] after the WDR reset trap. The 
values in TSTATE[MAXTL], TPC[MAxTL], TNPC[MAXTL] and TT[MAXTL] are accurate 
and are intended for debug purposes. 


m Externally initiated reset (XIR) — Initiated in response to a signal or event that is 
external to the virtual processor. This reset trap is normally used for critical 
system events, such as power failure. The XIR reset trap is treated as an interrupt 
and processed similarly to a disrupting trap (but without masking). Software can 
resume the interrupted program at the conclusion of trap handler 
execution.triggers 


m Software-initiated reset (SIR) — Initiated by software by executing the SIR 
instruction in hyperprivileged mode. In nonprivileged and privileged mode, the 
SIR instruction causes an illegal instruction exception (which results in a trap to 
hyperprivileged mode). The SIR reset trap is processed similar to a precise trap. 
The PC saved in TPC[TL] points to the SIR instruction. If the SIR reset is detected 
when TL =, the enters error state and triggers a WDR reset. 


12.3.5 Uses of the Trap Categories 


The SPARC V9 trap model stipulates the following: 
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. Reset traps (except software initiated reset traps) occur asynchronously to 


program execution. 


When recovery from an exception can affect the interpretation of subsequent 
instructions, such exceptions shall be precise. See TABLE 12-4, TABLE 12-5, and 
Exception and Interrupt Descriptions on page 497 for identification of which traps 
are precise. 


In an UltraSPARC Architecture implementation, all exceptions that occur as the 
result of program execution, except for errors on store instructions that occur after 
the store instruction that has passed the retirement point, are precise (impl. dep. 
#33-V8-Cs10). 


An error detected after the initial access of a multiple-access load instruction (for 
example, LDTX or LDBLOCKF) should be precise. Thus, a trap due to the second 
memory access can occur. However, the processor state should not have been 
modified by the first access. 


Exceptions caused by external events unrelated to the instruction stream, such as 
interrupts, are disrupting. 


A deferred trap may occur one or more instructions after the trap-inducing 
instruction is dispatched. 





12.4 


Trap Control 


Several registers control how any given exception is processed, for example: 


The interrupt enable (ie) field in PSTATE and the Processor Interrupt Level (PIL) 
register control interrupt processing. See Disrupting Traps on page 463 for details. 


The enable floating-point unit (fef) field in FPRS, the floating-point unit enable 
(pef) field in PSTATE, and the trap enable mask (tem) in the FSR control floating- 
point traps. 


The hyperprivileged mode bit (hpriv) field in the HPSTATE register, which can 
affect how a trap is delivered. See Conditioning of Disrupting Traps on page 464 for 
details. 


The TL register, which contains the current level of trap nesting, controls whether 
a trap causes entry to execute_state, RED_state, or error_state. It also 
affects whether the trap is processed in privileged mode or hyperprivileged 
mode. 
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m Fora trap delivered to the virtual processor in privileged mode, PSTATE.tle 
determines whether implicit data accesses in the trap handler routine will be 
performed using big-endian or little-endian byte order. A trap delivered to the 
virtual processor in hyperprivileged mode always uses a default byte order of 
big-endian. 


Between the execution of instructions, the virtual processor prioritizes the 
outstanding exceptions, errors, and interrupt requests. At any given time, only the 
highest-priority exception, error, or interrupt request is taken as a trap. When there 
are multiple interrupts outstanding, the interrupt with the highest interrupt level is 
selected. When there are multiple outstanding exceptions, errors, and/or interrupt 
requests, a trap occurs based on the exception, error, or interrupt with the highest 
priority (numerically lowest priority number in TABLE 12-5). See Trap Priorities on 
page 485. 


12.4.1 PIL Control 


When an interrupt request occurs, the virtual processor compares its interrupt 
request level against the value in the Processor Interrupt Level (PIL) register. If the 
interrupt request level is greater than PIL and no higher-priority exception is 
outstanding, then the virtual processor takes a trap using the appropriate 

interrupt level n trap vector. 


1242 FSR.tem Control 


The occurrence of floating-point traps of type IEEE 754 exception can be controlled 
with the user-accessible trap enable mask (tem) field of the FSR. If a particular bit of 
FSR.tem is 1, the associated IEEE 754 exception can cause an 

fp exception ieee 754 trap. 


If a particular bit of FSR.tem is 0, the associated IEEE 754 exception does not cause 
an fp exception ieee 754 trap. Instead, the occurrence of the exception is recorded 
in the FSR's accrued exception field (aexc). 


If an IEEE 754 exception results in an fp. exception ieee 754 trap, then the 
destination F register, FSR.fccn, and FSR.aexc fields remain unchanged. However, 
if an IEEE 754 exception does not result in a trap, then the F register, FSR.fccn, and 
FSR.aexc fields are updated to their new values. 
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12:5 


12:5.1 


Trap-Table Entry Addresses 


Traps are delivered to the virtual processor in either privileged mode or 
hyperprivileged mode, depending on the trap type, the value of TL at the time the 
trap is taken, and the privilege mode at the time the exception was detected. See 
TABLE 12-4 on page 475 and TABLE 12-5 on page 481 for details. 


Unique trap table base addresses are provided for traps being delivered in 
privileged mode and in hyperprivileged mode. 


Trap-Table Entry Address to Privileged Mode 


Privileged software initializes bits 63:15 of the Trap Base Address (TBA) register (its 
most significant 49 bits) with bits 63:15 of the desired 64-bit privileged trap-table 
base address. 


At the time a trap to privileged mode is taken: 

m Bits 63:15 of the trap vector address are taken from TBA(63:15]. 

m Bit 14 of the trap vector address (the "TL 20" field) is set based on the value of TL 
just before the trap is taken; that is, if TL = 0 then bit 14 is set to 0 and if TL» 0 
then bit 14 is set to 1. 

m Bits 13:5 of the trap vector address contain a copy of the contents of the TT 
register (TT[TL]). 

m Bits 4:0 of the trap vector address are always 0; hence, each trap table entry is at 
least 2? or 32 bytes long. Each entry in the trap table may contain the first eight 
instructions of the corresponding trap handler. 


FIGURE 12-3 illustrates the trap vector address for a trap delivered to privileged 
mode. In FIGURE 12-3, the "TL 0" bit is 0 if TL = 0 when the trap was taken, and 1 if 
TL » 0 when the trap was taken. This implies, as detailed in the following section, 
that there are two trap tables for traps to privileged mode: one for traps from TL =0 
and one for traps from TL > 0. 


from TBA{63:15} (TBA.tba_high49) 


63 15 14 1 


T[TL]| 0 0000 
3 54 0 


FIGURE 12-3 Privileged Mode Trap Vector Address 
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12.5.2 Privileged Trap Table Organization 
The layout of the privileged-mode trap table (which is accessed using virtual 
addresses) is illustrated in FIGURE 12-4. 
Value Software Hardware Trap Table 
of TL Trap Trap Type Offset 
(epre Type (TT[TL]) (from TBA) Contents of Trap Table 
trap 
— 00046—07F16 016- FEO,g | Hardware traps 
TL=0 — 080;6-0FF;6 100046-1 FE016 Spill / fill traps 
7 016- 7Fig 100,6-17F36  200046-2FE046 | Software traps to Privileged level 
E 1 80; 6-1 FF, 6 3000, 6g-3FE0; 6 unassigned 
= 000;g-07F;g 400016-4FE0416 | Hardware traps 
TL 21 — 080:6-0FF16  500036-5FE046 | Spill / fill traps 
(E= Oie-7Fig  10046-17F4 600016-6FE0416 | Software traps to Privileged level 
MAXPTL-1) 
= 180;6-1FF46  7000;g—7FE0;g | unassigned 
FIGURE 12-4 Privileged-mode Trap Table Layout 
The trap table for TL = 0 comprises 512 thirty-two-byte entries; the trap table for 
TL > 0 comprises 512 more thirty-two-byte entries. Therefore, the total size of a full 
privileged trap table is 2 x 512 x 32 bytes (32 Kbytes). However, if privileged 
software does not use software traps (Tcc instructions) at TL > 0, the table can be 
made 24 Kbytes long. 
12.5.3 Trap-Table Entry Address to Hyperprivileged 


Mode 


Hyperprivileged software initializes bits 63:14 of the Hyperprivileged Trap Base 
Address (HTBA) register (its most significant 50 bits) with bits 63:14 of the desired 
64-bit hyperprivileged trap table base address. 


At the time a trap to hyperprivileged mode is taken: 

m Bits 63:14 of the trap vector address are taken from HTBA{63:14}. 

m Bits 13:5 of the trap vector address contain a copy of the contents of the TT 
register (TT[TL]). 

m Bits 4:0 of the trap vector address are always 0; hence, each trap table entry is at 
least 2? or 32 bytes long. Each entry in the trap table may contain the first eight 
instructions of the corresponding trap handler. 
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FIGURE 12-5 illustrates the trap vector address used for a trap delivered to 


hyperprivileged mode. 
from TBA{63:14} (TBA.tba high50) TT[TL]] 0 0000 
63 14 13 54 0 


FIGURE 12-5 Hyperprivileged Mode Trap Vector Address 


12.5.4  Hyperprivileged Trap Table Organization 


The layout of the hyperprivileged-mode trap table (which is accessed using physical 
addresses) is illustrated in FIGURE 12-6. 


Software Hardware Trap Table 
Trap Trap Type Offset 
Type (TT[TL]) (from HTBA) Contents of Trap Table 


= 00046—07F16 016- FE046 | Hardware traps 
= 080:6-0FF;6  1000:6-1FE016 | Spill / fill traps 


Software traps from hyperprivilege 
016- 7Fte NOM TT 2000 IE FE QME level to on eds ve | T 








8016- FF46 18046-1FF46  3000,6-3FE0;6 | Software traps to hyperprivileged level 





FIGURE 12-6 Hyperprivileged-mode Trap Table Layout 


The hyperprivileged trap table comprises 512 thirty-two-byte entries. Therefore, the 
total size of a full hyperprivileged trap table is 512 x 32 bytes (16 Kbytes). 


12.5.5 Trap Table Entry Address to RED. state 





Traps occurring in RED. state or traps that cause the virtual processor to enter 
RED, state use an abbreviated trap vector, called the RED. state trap vector. 











The RED. state trap vector is located at the following address, referred to as 
RSTVADDR (impl. dep. #114-V9-Cs10): 


Physical Address RSTVADDR - FFFF FFFF F000 000016 
(highest 256 MB of physical address space) 


In an implementation that implements fewer than 64 bits of physical addressing, 
unimplemented high-order bits of the above RSTVADDR are ignored. 
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FIGURE 12-7 illustrates the trap vector address used for a trap delivered to 
RED. state (in hyperprivileged mode). 


FFFF4g FFFF 4g FFFF 4g | 00 | TT[TL]| 0 0000 


63 48 47 32 31 16 15 14 13 5 4 0 





FIGURE 12-7 RED. state Trap Vector Address 





12.5.6 RED, state Trap Table Organization 





The RED. state trap table is constructed so that it can overlay the hyperprivileged 
trap table (see FIGURE 12-6) if necessary. For a trap to RED, state, the trap table 
offset is added to the base address contained in RSTVADDR to yield the RED state 
trap vector. FIGURE 12-8 illustrates the layout of the RED. state trap table. 











Trap Table 
EALGNSIG Offset (from 


Trap Type 
(rte) RSTVADOA) Contents of Trap Table 


o 90s 





1 2016 Power-on reset (POR) 
TT! 4046 Watchdog reset (WDR) 
baT 6 
4 8046 Software-initiated reset (SIR) 





TT* A016 All other exceptions in RED state 





+ TT = trap type of the exception that caused entry into error state 


TT =3 if an externally initiated reset (XIR) occurs while the virtual processor is not in 
error state;TT = trap type of the exception that caused entry into error state if the 
externally initiated reset occurs in error state. 


* TT = trap type of the exception. See TABLE 12-4 on page 475. 


FIGURE 12-8. RED, state Trap Table Layout 





12:57 Trap Type (TT) 


When a normal trap occurs, a value that uniquely identifies the type of the trap is 

written into the current 9-bit TT register (TT[TL]) by hardware. Control is then 

transferred into the trap table to an address formed by one of the following, 

depending on the trap's destination privilege mode: 

m The TBA register, (TL > 0), and TT[TL] (see Trap-Table Entry Address to Privileged 
Mode on page 469) 

m The HTBA register and TT[TL] (see Trap-Table Entry Address to Hyperprivileged 
Mode on page 470) 
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Programming | The spill n *, fill n *, clean 


window, and MMU-related traps 





Note | (fast instruction access MMU miss, 











When a RED. state trap occurs, the TT register 
page 457. Control is then transferred into the RI 





fast dala access MMU miss, and 

fast data access protection) are spaced such that their trap- 
table entries are 128 bytes (32 instructions) long in the 
UItraSPARC Architecture. This length allows the complete code 
for one spill/fill routine, a clean window routine, or a normal 
MMU miss handling routine to reside in one trap-table entry. 


is set as described in RED state on 
ED state trap table at an address 


formed by RSTVADDR and an offset depending on the condition. 


TT values 000416—0FF46 are reserved for hardware traps. TT values 10045,-17F46 are 
reserved for software traps (caused by execution of a Tcc instruction) to privileged- 


mode trap handlers. TT values 18046- 1FF46 are 
handlers operating in hyperprivileged mode. 


used for software traps to trap 


IMPL. DEP. #35-V8-Cs20: TT values 06046 to 07F46 were reserved for 
implementation dependent exception n exceptions in the SPARC V9 specification, 
but are now all defined as standard UltraSPARC Architecture exceptions. See 


TABLE 12-4 for details. 
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The assignment of TT values to traps is shown in TABLE 12-4; TABLE 12-5 provides the 
same list, but sorted in order of trap priority. The key to both tables follows: 





Symbol Meaning 





e This trap type is associated with a feature that is architecturally required in an 
implementation of UltraSPARC Architecture 2005. Hardware must detect this 
exception or interrupt, trap on it (if not masked), and set the specified trap type 
value in the TT register. 


O This trap type is associated with a feature that is architecturally defined in 
UltraSPARC Architecture 2005, but its implementation is optional. 


P Trap is taken via the Privileged trap table, in Privileged mode (PSTATE.priv = 1) 


Trap is taken via the Hyperprivileged trap table, in Hyperprivileged mode 
(HSTATE.hpriv = 1) 


HY Trap is taken via the Hyperprivileged trap table, in Hyperprivileged mode 
(HSTATE.hpriv = 1). However, the trap is unexpected. While hardware can 
legitimately generate this trap, it should not do so unless there is a programming 
error or some other error. Therefore, occurrence of this trap indicates an actual 
error to hyperprivileged software. 


-X- Not possible. Hardware cannot generate this trap in the indicated running mode. 
For example, all privileged instructions can be executed in both privileged and 
hyperprivileged modes, therefore a privileged_opcode trap cannot occur in 
privileged or hyperprivileged mode. 


— This trap is reserved for future use. 


(am) Always Masked — when the condition occurs in this privilege mode, it is always 
masked out (but remains pending). 


(ie) When the outstanding disrupting trap condition occurs in this privilege mode, it 
may be conditioned (masked out) by PSTATE.ie = 0 (but remains pending). 


(nm) Never Masked — when the condition occurs in this running mode, it is never 
masked out and the trap is always taken. 


(pend) Held Pending — the condition can occur in this running mode, but can't be 
serviced in this mode. Therefore, it is held pending until the mode changes to 
one in which the exception can be serviced. 
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TABLE 12-4 Exception and Interrupt Requests, by TT Value (1 of 6) 


UA-2005 
@=Req'd. 
OzOptl 


Exception or Interrupt Request 
Reserved 


power on reset 


watchdog reset 


externally initiated reset 


software initiated reset 


Reserved 


RED state exception 


implementation-dependent 


store error 


instruction access exception 


instruction access error 


Reserved 


Reserved 


Reserved 


illegal instruction 


privileged opcode 


Reserved 


Reserved 


TT 
(Trap 
Type) 


00016 
00116 


TT^ 





Trap 


Category 


reset 


reset 


precise 


deferred 


precise 


precise 


precise 


precise 


Mode in which Trap is 
Delivered (and 
Conditioning Applied), 























Priority based on Current 
(0 = Privilege Mode 
High- 
est) NP Priv HP 
0 H H H 
(nm) (nm) (nm) 
1.2 H H H 
(nm) (nm) (nm) 
1.1 H H H 
(nm) (nm) (nm) 
1.3 -X- -X- H 
(nm) 
^ H H H 
(nm) (nm) (nm) 
2.01 H H H 
(nm) (nm) (nm) 
3 H H HU 
(nm) (nm) (nm) 
4 H H H 
(nm) (nm) (nm) 
6.2 H H H 
(nm) (nm) (nm) 
7 P -x- -x- 
(nm) 
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TABLE 12-4 Exception and Interrupt Requests, by TT Value (2 of 6) 





UA-2005 
O-Reqd. 
OzOptl 


Exception or Interrupt Request 


Reserved 

fp disabled 

fp exception ieee 754 
fp exception other 

tag overflow? 

clean window 

Reserved 
division by zero 
internal processor error 
instruction invalid TSB entry 
data invalid TSB entry 


Reserved 


Reserved 

data access exception 
data access error 

data access protection 


mem adaress not aligned 


TT 
(Trap 
Type) 


01816- 
01F16 
02016 
02116 
02216 
02316 
02416? 
02516- 


02716 
02816 





02946 


03246 


03316 





03416 
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Trap 
Category 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


Mode in which Trap is 
Delivered (and 
Conditioning Applied), 




















Priority based on Current 
(0 = Privilege Mode 
High- 
est) NP Priv HP 
8 P P HU 
(nm) (nm) (nm) 
11.1 P P HU 
(nm) (nm) (nm) 
11.1 P P HU 
(nm) (nm) (nm) 
14 P P HU 
(nm) (nm) (nm) 
10.1 P P HU 
(nm) (nm) (nm) 
15 P P HU 
(nm) (nm) (nm) 
+ H H H 
(nm) (nm) (nm) 
2.10 H H -x- 
(nm) (nm) 
1200 H H H 
(nm) (nm) (nm) 
12.01 H H HU 
(nm) (nm) (nm) 
1210 H H H 
(nm) (nm) (nm) 
1207 H H H 
(nm) (nm) (nm) 
10.2 H H HU 


TABLE 12-4 Exception and Interrupt Requests, by TT Value (3 of 6) 





UA-2005 
@=Req'd. 
OzOptl 


Exception or Interrupt Request 


LDDF mem adaress not aligned 





STDF mem adaress not aligned 





privileged action 


LDQF mem adaress not aligned 





STQF mem adaress not aligned 





Reserved 
Reserved 


Reserved 


instruction real translation miss 


data real translation miss 


Reserved 


interrupt level n (n = 1-15) 


Reserved 


hstick match 


trap level zero 
Reserved 


PA watchpoint (RA watchpoint) 


Reserved 


03746 


03846 








Trap 
Category 
precise 
precise 
precise 


precise 


precise 


precise 


precise 


disrupting 


disrupting 


precise 


precise 


Priority 
(0- 
High- 
est) 


10.1 


Mode in which Trap is 
Delivered (and 
Conditioning Applied), 
based on Current 
Privilege Mode 





NP Priv HP 


H H HU 
(nm) (nm) (nm) 




















10.1 H H HU 
(nm) (nm) (nm) 
11.1 H H -x- 
(nm) (nm) 
10.1 H H HU 
(nm) (nm) (nm) 
10.1 H H HU 
(nm) (nm) (nm) 
2.08 H H -x- 
(nm) (nm) 
12.03 H H H 
(nm) (nm) (nm) 
32-n P P (pend) 
(31to (ie) (ie) 
17) 
16.01 H H H 
(nm) (nm) die) 
2.02 H H -x- 
12.09 H H H 


(nm) (nm) (nm) 
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TABLE 12-4 Exception and Interrupt Requests, by TT Value (4 of 6) 











UA-2005 TT 
@=Req’d. (Trap 

OzOpt’! Exception or Interrupt Request Type) 
O VA_watchpoint 06216 
e fast instruction access MMU miss 064 & 
— Reserved 06516- 
06716 

e fast data access MMU miss 06816? 
— Reserved 06916- 
06B16 

e fast data access protection 06C 6+ 

— Reserved 06D46- 
06F 16 

O implementation dependent exception n 07016- 
(impl. dep. #35-V8-Cs20) 07516 
instruction breakpoint 07616 


a implementation dependent exception n 077 
(impl. dep. #35-V8-Cs20) 





m) implementation dependent exception n 07916- 


(impl. dep. #35-V8-Cs20) 07B16 
— Reserved 07916 
e cpu mondo 07C16 
e dev mondo 07D16 
e resumable error 07E16 
a implementation dependent exception 15 ^ 07F46 


(impl. dep. #35-V8-Cs20) 


— nonresumable error O7F 16 
(generated by hyperprivileged software, 
not by hardware) 
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Trap 
Category 
precise 


precise 


precise 


precise 


precise 


disrupting 


disrupting 


disrupting 


Mode in which Trap is 
Delivered (and 
Conditioning Applied), 

















Priority based on Current 
(0 = Privilege Mode 
High- 
est) NP Priv HP 
11.2 P P -X- 
(nm) (nm) 
2.08 H H -X- 
(nm) (nm) 
12.03 H H H 
(nm) (nm) (nm) 
12.07 H H H 
(nm) (nm) (nm) 
y — EN = 
6.1 H H H 
y = EE — 
v = = m 
16.08 P P (pend) 
(ie) (ie) 
1611 P P (pend) 
(ie) (ie) 
33.3 P P (pend) 
(ie) (ie) 
v x n = 





TABLE 12-4 Exception and Interrupt Requests, by TT Value (5 of 6) 





UA-2005 
@=Req’d. 
OzOptl 


Exception or Interrupt Request 


spill n normal (n = 0-7) 


(reserved for use by spill 7 normal; 
see footnote for trap type 09C16) 


Spill n other (n = 0-7) 


(reserved for use by spill 7 other 
see footnote for trap type 0BC16) 


fill n normal (n = 0-7) 


(reserved for use by fill 7 normal; 
see footnote for trap type 0DC16) 


fill n other (n = 0-7) 


(reserved for use by fill 7 other 
see footnote for trap type OFC16) 


trap instruction 


TT 
(Trap 
Type) 


08016 
09C16t 


09D16- 
09F 16 


OA016t- 
OBC yg! 


0BD16- 
OBF 16 


0C04 6 
ODC} 6t 


ODD46- 
ODF 46 


0E0416" 
OFC ¢t 


OFD16- 
OFF 16 


10016- 
17F16 


Trap 
Category 


precise 


precise 


precise 


precise 


precise 


Mode in which Trap is 
Delivered (and 
Conditioning Applied), 

















Priority based on Current 
(0 = Privilege Mode 
High- 
est) NP Priv HP 
9 P P HU 
(nm) (nm) (nm) 
9 P P HU 
(nm) (nm) (nm) 
9 P P HU 
(nm) (nm) (nm) 
9 P P HU 
(nm) (nm) (nm) 
1600 P P HY 
(nm) (nm) (nm) 
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TABLE 12-4 Exception and Interrupt Requests, by TT Value (6 of 6) 





UA-2005 TT 
O-Reqd. (Trap 
OzOptl Exception or Interrupt Request Type) 
e htrap instruction 18016- 
1FF16 


€ guest watchdog ? TT? 


Mode in which Trap is 
Delivered (and 
Conditioning Applied), 





Priority based on Current 
(0 = Privilege Mode 
Trap High- 
Category est) NP Priv HP 
precise 16.02 -X- H HU 
(nm) (nm) 
precise or 9 H H -x- 
disrupting? (nm) (nm) 


* Although these trap priorities are recommended, all trap priorities are implementation dependent (impl. dep. #36-V8 on page 


485), including relative priorities within a given priority level. 


* This exception type is only used in UltraSPARC Architecture 2005 implementations that support hardware MMU table walking. 
See description of this exception in Exception and Interrupt Descriptions on page 497. 


+The trap vector entry (32 bytes) for this trap type plus the next three trap types (total of 128 bytes) are permanently reserved for 


this exception. 


9 The guest watchdog trap is caused when TL 2 MAXPTL and any precise or disrupting trap occurs that is destined for privileged 
mode. guest watchdog shares a trap table offset with watchdog reset (4016), but retains the trap type (TT) value and priority 


of the exception that caused the trap. 


^ watchdog reset uses the trap vector entry for trap type 00246 (trap table offset 40,6), but retains the trap type (TT) value of the 


exception that caused entry into error state . 


+ RED state exception uses the trap vector entry for trap type 0054, (trap table offset A016), but retains the trap type (TT) value 


and priority of the exception that caused the trap. 


+ The priority of internal processor error is implementation dependent (impl. dep. # 402-510) 
V The priority of an implementation dependent exception n trap is implementation dependent (impl. dep. # 35-V8-Cs20) 
P This exception is deprecated, because the only instructions that can generate it have been deprecated. 
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TABLE 12-5 Exception and Interrupt Requests, by Priority (1 of 4) 


UA-2005 
@=Req’d. 
OzOptl 
[].-1mpl- 
Specific 


Exception or Interrupt Request 


power on reset 


externally initiated reset 


watchdog reset 


software initiated reset 


store error 


trap level zero 


instruction real translation miss 


fast instruction access MMU miss 


instruction invalid TSB entry 


instruction access exception 


instruction access error 


instruction breakpoint 


illegal instruction 


privileged opcode 


fp disabled 


TT 
(Trap 
Type) 
00146 
00316 

TT* 
00416 


00716 


05F16 
03E 16 


06416? 
02A16 
00816 

00A46 


07646 
01046 


01116 


02046 


Trap 
Category 
reset 
reset 
reset 
reset 


deferred 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


Priority 
(0- 
High- 
est) 

0 
1.1 
1.2 
1.3 


2.01 


2.02 


6.1 
6.2 


Mode in which Trap is 
Delivered and (and 
Conditioning Applied), 
based on Current 
Privilege Mode 





NP Priv HP 


H H H 
(nm) (nm) (nm) 


H H H 
(nm) (nm) (nm) 


H H H 
(nm) (nm) (nm) 


-X- -X- H 
(nm) 


H H H 
(nm) (nm) (nm) 


H H -X- 
H H -X- 


H H -X- 


(nm) (nm) (nm) 
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TABLE 12-5 Exception and Interrupt Requests, by Priority (2 of 4) 





Mode in which Trap is 
Delivered and (and 














UA-2005 . | Conditioning Applied), 
9 -Req'd. Priority ^ based on Current 
OzOptl TT (0 = Privilege Mode 
[].-1mpl- (Trap Trap High- 
Specific Exception or Interrupt Request Type) Category est) NP Priv HP 
€ spill 1 normal (n = 0-7) 080461- precise P P HU 
09C16 (nm) (nm) (nm) 
€ spill n other (n = 0-7) OAOjg— precise P P HU 
OBC16? (nm) (nm) (nm) 
€ fil : normal (n = 0-7) 0C0i6! ^ precise P P HY 
ODC16t (nm) (nm) (nm) 
e fill n other (n = 0-7) 0E016*- precise P P HY 
OFC16 (nm) (nm) (nm) 
€ X clean window 02416? precise P P HY 
(nm) (nm) (nm) 
€ X LDDF mem address not aligned 03516 precise H H HE 
(nm) (nm) (nm) 
e STDF mem adaress not aligned 03616 precise 10.1 H H HU 





O . LDQF mem address not aligned 03816 precise H H HU 





O STQF_mem_address_not_aligned 03946 precise H H HU 





e mem address not aligned 03416 precise 10.2 H H HU 
O fp exception other 02216 precise P P HE 
O fp exception ieee 754 02116 precise P P HU 


111 | (nm) (nm) (nm) 


e privileged action 03716 precise H H -x- 


O VA_watchpoint 06216 precise 11.2 P P -X- 


e data access exception 03016 precise 12.01 H H HU 
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TABLE 12-5 Exception and Interrupt Requests, by Priority (3 of 4) 





Mode in which Trap is 
Delivered and (and 











UA-2005 . . Conditioning Applied), 
9-Reqd. Priority ^ based on Current 
OzOptl TT (0 = Privilege Mode 
[.].-1Impl- (Trap Trap High- 
Specific Exception or Interrupt Request Type) Category est) NP Priv HP 
e data real translation miss 03F 16 precise H H H 
(nm) (nm) (nm) 
€ fast data access MMU miss 06816? precise H H H 
(nm) (nm) (nm) 
12.03 
O data_invalid_TSB_entry 02B16 precise H H H 
(nm) (nm) (nm) 
e fast dala access protection 06C 6+ precise H H H 
(nm) (nm) (nm) 
12.07 
= data_access_protection 03316 precise H H H 
(nm) (nm) (nm) 
O PA watchpoint (RA_watchpoint) 06116 precise 12.09 H H H 
(nm) (nm) (nm) 
O data access error 03216 precise 1210 H H H 
(nm) (nm) (nm) 
€ tag overflow? 02316 precise 14 P P HU 
(nm) (nm) (nm) 
€ division by zero 02816 precise 15 P P HU 
(nm) (nm) (nm) 
e hstick match 05E16 disrupting 16.01 H H H 
(nm) (nm) (ie) 
e trap instruction 10016- precise P P H 
17F16 (nm) (nm) (nm) 
16.02 
e htrap instruction 18016- precise -X- H HY 
1FF16 (nm) (nm) 
e cpu mondo 07C16 disrupting 16.08 P P (pend) 
(ie (ie) 
e dev mondo 07D16 disrupting 16.111 P P (pend) 
(ie) (ie) 
e interrupt level n (n = 1-15) 04116- disrupting 32-n P P (pend) 
04F 16 (31 to (ie) (ie) 
17) 
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TABLE 12-5 Exception and Interrupt Requests, by Priority (4 of 4) 





Mode in which Trap is 
Delivered and (and 





UA-2005 . | Conditioning Applied), 
@=Req'd. Priority based on Current 
OzOptl TT (0 = Privilege Mode 
L1.=Impl- (Trap Trap High- 
Specific Exception or Interrupt Request Type) Category est) NP Priv HP 
e resumable error 07E16 disrupting — 33.3 P P (pend) 
Ge) (ie) 
e guest watchdog 9 "n precise or © H H -X- 
TT : 20 
disrupting (nm) (nm) 
e RED state exception TT* precise ^ H H H 
(nm) (nm) (nm) 
O internal processor error 02916 precise * H H H 
(nm) (nm) (nm) 
O implementation dependent exception n 07016 — = V = = = 
(impl. dep. #35-V8-Cs20) 07516; 
07716, 
07916 — 
07B46, 
07F16 
— nonresumable error 07F16 








(generated by hyperprivileged software, 
not by hardware) 


" Although these trap priorities are recommended, all trap priorities are implementation dependent (impl. dep. #36-V8 on 
page 485), including relative priorities within a given priority level. 


* This exception type is only used in UltraSPARC Architecture 2005 implementations that support hardware MMU table 
walking. See description of this exception in Exception and Interrupt Descriptions on page 497. 


T The trap vector entry (32 bytes) for this trap type plus the next three trap types (total of 128 bytes) are permanently reserved 
for this exception. 


9 The guest watchdog trap is caused when TL 2 MAXPTL and any precise or disrupting trap occurs that is destined for privi- 
leged mode. guest watchdog shares a trap table offset with watchdog reset (4016), but retains the trap type (TT) value 
and priority of the exception that caused the trap. 


^ watchdog reset uses the trap vector entry for trap type 00246 (trap table offset 4046), but retains the trap type (TT) value of 
the exception that caused entry into error state . 


+ HED state exception uses the trap vector entry for trap type 00514 (trap table offset A046), but retains the trap type (TT) 
value and priority of the exception that caused the trap. 


+ The priority of internal processor error is implementation dependent (impl. dep. # 402-510) 
V The priority of an implementation dependent exception n trap is implementation dependent (impl. dep. # 35-V8-Cs20) 





P This exception is deprecated, because the only instructions that can generate it have been deprecated. 
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12.5.8 


12.5.7.1 Trap Type for Spi II/Fill Traps 


The trap type for window spill/fill traps is determined on the basis of the contents of 
the OTHERWIN and WSTATE registers as described below and shown in FIGURE 12-9. 


Bit Field Description 

8:6 spill or fill 010» for spill traps; 011» for fill traps 

5 other (OTHERWIN z 0) 

4:2 wtype If (other) then WSTATE.other; else WSTATE.normal 





8 6 5 4 2 1 0 
FIGURE 12-9 Trap Type Encoding for Spill/Fill Traps 


Trap Priorities 


TABLE 12-4 on page 475 and TABLE 12-5 on page 481 show the assignment of traps to 
TT values and the relative priority of traps and interrupt requests. A trap priority is 
an ordinal number, with 0 indicating the highest priority and greater priority 
numbers indicating decreasing priority; that is, if x < y, a pending exception or 
interrupt request with priority x is taken instead of a pending exception or interrupt 
request with priority y. Traps within the same priority class (0 to 33) are listed in 
priority order in TABLE 12-5 (impl. dep. #36-V8). 


IMPL. DEP. #36-V8: The relative priorities of traps defined in the UltraSPARC 
Architecture are fixed. However, the absolute priorities of those traps are 
implementation dependent (because a future version of the architecture may define 
new traps). The priorities (both absolute and relative) of any new traps are 
implementation dependent. 


However, the TT values for the exceptions and interrupt requests shown in 
TABLE 12-4 and TABLE 12-5 must remain the same for every implementation. 


The trap priorities given above always need to be considered within the context of 
how the virtual processor actually issues and executes instructions. For example, if 
an instruction access error occurs (priority 3), it will be taken even if the instruction 
is an SIR (priority 1). This situation occurs because the virtual processor detects 
instruction access error during instruction fetch and never actually issues or 
executes the instruction, so the SIR instruction is never seen by the execution units of 
the virtual processor. This is an obvious case, but there are other more subtle cases. 
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12.6 Trap Processing 


The virtual processor's action during trap processing depends on various virtual 
processor states, including the trap type, the current level of trap nesting (given in 
the TL register), HPSTATE, and PSTATE. When a trap occurs, the GL register is 
normally incremented by one (described later in this section), which replaces the set 
of eight global registers with the next consecutive set. 





The following traps are processed in RED. state: 
m POR, XIR, and WDR reset requests 

m SIR reset request when TL « MAXTL 

m Non-reset traps taken when TL = MAXTL - 1 





m Traps taken when the virtual processor is in RED state 





All other traps are handled in execute. state using normal trap processing. 


During normal operation, the virtual processor is in execute. state. It processes 
traps in execute state and continues. 








When a nonreset trap or software-initiated reset (SIR) occurs with TL = MAXTL, there 
are no more levels on the trap stack, so the virtual processor enters the transitory 
state error state. The virtual processor remains in error state for an 
implementation-dependent duration, then generates a WDR reset (impl. dep. #254- 
U3-Cs10) to effect a change from error state to RED, state. 








Traps processed in RED. state use a special trap vector and a special trap-vectoring 
algorithm. RED. state vectoring and the setting of the TT value for RED. state 
traps are described in RED state Trap Table Organization on page 472. 











Traps that occur with TL = MAXTL — 1 are processed in RED, state. In addition, reset 
traps are also processed in RED. state. Reset trap processing is described in 

RED state Trap Processing on page 490. Finally, software can force the processor 
into RED. state by setting the HPSTATE.red bit to 1. 











Once the virtual processor has entered RED. state, no matter how it got there, all 
subsequent traps are processed in RED, state until software returns the virtual 
processor to execute state or a normal or SIRtrap is taken with TL = MAXTL, 
which puts the virtual processor in error state. 
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TABLE 12-6, TABLE 12-7, and TABLE 12-8 describe the virtual processor mode and trap- 
level transitions involved in handling traps. 





. Stat 
New State, After Receiving Trap Type 


TABLE 12-6 Trap Received While in execut 

































































zi. Nonreset Tra 
Original State P POR XIR SIR 
or Interrupt 
execute state execute state RED. state ED state RED state 
TL < MAXTL - 1 TL — TL «1 TL = MAXTL Le TL«1 TL — TL «1 
execute state RED state RED, state ED state RED state 
TL = MAXTL - 1 TL = MAXTL TL = MAXTL MAXTL TL = MAXTL 
execute state! error state RED, state ED state error state 
TL = MAXTL TL = MAXTL TL = MAXTL TL = MAXTL TL = MAXTL 
* This state occurs when software changes TL to MAXTL and leaves HPSTATE.red = 0, or if software sets HPSTATE.red < 0 while 
TL = MAXTL. 
t WDR can only be generated from error state. 
TABLE 12-7 Trap Received While in RED state 
New State, After Receiving Trap Type 
Lo Nonreset Tra 
Original State P POR XIR WDR ł SIR 
or Interrupt 
RED_state RED_state RED_state ED_state RED_state 
TL < MAXTL-1 TL — TL «1 TL = MAXTL Le TL«1 TL — TL «1 
RED, state RED state RED, state ED state RED state 
TL = MAXTL - 1 TL = MAXTL TL = MAXTL MAXTL TL = MAXTL 
RED, state error state RED, state RED state t error state 
TL = MAXTL TL = MAXTL TL = MAXTL TL = MAXTL TL = MAXTL 
t WDR can only be generated from error state. 
TABLE 12-8 Reset Received While in error state 
New State, After Receiving Trap Type 
SIR 


Original State Nonresst Trap 
or Interrupt 


RED, state RED, state — 
MAXTL TL = MAXTL 





RED_state 
MAXTL TL = 


error_state 
TL = MAXTL TL = 





The virtual processor does not recognize interrupts while it is in error_state. 


A non-reset trap causes the following state changes to occur: 


CHAPTER 12 * Traps 487 











If the virtual processor is already in RED. state, the new trap is processed in 
RED, state unless TL = MAXTL. See Nonreset Traps When the Virtual Processor Is in 
RED state on page 496. 





If the virtual processor is in execute state and the trap level is one less than 
its maximum value, that is, TL = MAXTL-1, then the virtual processor enters 

RED. state. See RED state on page 457 and Nonreset Traps with TL = MAXTL — 1 
on page 490. 














If the virtual processor is in either execute state or RED state and the trap 
level is already at its maximum value, that is, TL = MAXTL, then the virtual 
processor enters error state.See error state on page 460. 


Otherwise, the trap uses normal trap processing, described in the following section 
on Normal Trap Processing. 


12.6.1 Normal Trap Processing 





Normal traps comprise all traps processed in execute state; that is, all non- 





RI 


ED_state and non-error_state traps. 


A trap is delivered in either privileged mode or hyperprivileged mode, depending 
on the type of trap, the trap level (TL), and the privilege mode in effect when the 
exception was detected. 


During normal trap processing, the following state changes occur (conceptually, in 
this order): 


The trap level is updated. This provides access to a fresh set of privileged trap- 
state registers used to save the current state, in effect, pushing a frame on the trap 
stack. 
TL «+ TL+1 / / note that if TL = MAXTL — 1 before this trap, 
/ / trap would have been processed in 
/ / RED, state, not here using normal trap 
/ / processing. 





Existing state is preserved. 
TSTATE[TL].gl + GL 
TSTATE[TL].ccr + CCR 
TSTATE[TL].asi < ASI 
TSTATE[TL].pstate — PSTATE 
TSTATE[TL].cwp < CWP 
TPC[TL] — PC // (upper 32 bits zeroed if PSTATE.am = 1) 
TNPC[TL] < NPC // (upper 32 bits zeroed if PSTATE.am = 1) 
HTSTATE[TL].hpstate — HPSTATE //even for traps to privileged mode 


The trap type is preserved. 
TT[TL] < the trap type 


The Global Level register (GL) is updated. This normally provides access to a 
fresh set of global registers: 
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if (the trap is being delivered in privileged mode) 


then GL < min (GL + 1, MAXPGL) 

else (trap is being delivered in hyperprivileged mode) 
GL < min (GL + 1, MAXGL) 

endif 


m The PSTATE register is updated to a predefined state (even for traps to 
hyperprivileged mode): 

PSTATE.mm is unchanged 

PSTATE.pef «— 1 // if an FPU is present, it is enabled 

PSTATE.am <0 // address masking is turned off 

if (the trap is being delivered in privileged mode) 

then PSTATE.priv — 1 // the virtual processor enters privileged mode 
PSTATE.cle + PSTATE.tle / /set endian mode for traps 

else // trap is being delivered in hyperprivileged mode 
PSTATE.priv — 0 
PSTATE.cle — 0 

endif 

PSTATE.ie < 0 // interrupts are disabled 

PSTATE le ^ is unchanged 

PSTATE.tct < 0 // trap on CTI disabled 


m The HPSTATE register is updated: 
if (the trap is to hyperprivileged mode) 
then HPSTATE.red — 0 
HPSTATE.hpriv + 1 // enter hyperprivileged mode 
HPSTATE.ibe + 0 disable instruction breakpoints 
HPSTATE.tlz is unchanged 
endif 
m For a register-window trap (clean window, window spill, or window fill) only, 


CWP is set to point to the register window that must be accessed by the trap- 
handler software, that is: 


if TT[TL] = 0244, // a clean window trap 
then CWP e CWP +1 
endif 


if (08016 < TT[TL] < 0BF16) // window spill trap 
then CWP < CWP + CANSAVE + 2 
endif 


if (0C046 < TT[TL] x OFF56) // window fill trap 
then CWP = CWP -1 
endif 


For non-register-window traps, CWP is not changed. 
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m Control is transferred into the trap table: 
/ / Note that at this point, TL has already been incremented (above) 
if ( (trap is to privileged mode) and (TL < MAXPTL) ) 
then 
/ [the trap is handled in privileged mode 
/ / Note: The expression "(TL » 1)" below evaluates to the 
/ / value 05 if TL was 0 just before the trap (in which 
/ / case, TL = 1 now, since it was incremented above, 
/ / during trap entry). "(TL » 1)" evaluates to 1» if 
/ / TL was » 0 before the trap. 
PC — TBA{63:15} :: (TL > 1) :: TT[TL] :: 0 00005 
NPC < TBA(63:15] :: (TL > 1) :: TT[TL] :: 0 0100, 
else if ( (trap is to privileged mode) and (TL » MAXPTL) ) 
then  // this is the guest watchdog case; the trap is handled in 
/ / hyperprivileged mode using trap table offset 404. 
PC — HTBA{63:14} :: 00, :: 04016 
NPC <— HTBA{63:14} :: 005 :: 04446 
else (trap is handled in hyperprivileged mode } 
PC — HTBA{63:14} :: TT[TL ] :: 0 00005 
NPC — HTBA{63:14} :: TT[TL] :: 0 0100; 
endif 


Interrupts are ignored as long as PSTATE.ie = 0. 
Programming | State in TPC[n], TNPC[n], TSTATE[n], and TT[r]] is only changed 
Note | autonomously by the processor when a trap is taken while 


TL = n -1; however, software can change any of these values 
with a WRPR instruction when TL = n. 


12.6.2 RED state Trap Processing 





The following conditions invoke RED, state trap processing, and cause the trap to 
be delivered in hyperprivileged mode: 


m Traps taken with TL = MAXTL- 1 
m Power-on reset traps 

m Watchdog reset traps 

m Externally initiated reset traps 
m Software-initiated reset traps 

m Traps taken when the virtual processor is already in RED. state 





IMPL. DEP. #38-V8: Implementation-dependent registers may or may not be 
affected by the various reset traps. 


12.6.2.1 Nonreset Traps with TL = MAXTL- 1 


Nonreset traps that occur when TL = MAXTL - 1 are processed in RED. state. 
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The following state changes occur (conceptually, in this order) during a nonreset 
trap that occurs when TL = MAXTL - 1: 
m The trap level is advanced. 

TL < MAXTL 


m Existing state is preserved. 
TSTATE[TL].g! <— GL 
TSTATE[TL].ccr <— CCR 
TSTATE[TL].asi < ASI 
TSTATE[TL].pstate — PSTATE 
TSTATE[TL].cwp — CWP 
TPC[TL] < PC // (upper 32 bits zeroed if PSTATE.am = 1) 
TNPC[TL] < NPC // (upper 32 bits zeroed if PSTATE.am = 1) 


HTSTATE[TL].hpstate «- HPSTATE 


m The trap type is preserved. 
TT[TL] < the trap type 


m The Global Level register is updated. 
GL < min (GL + 1, MAXGL) 


m The PSTATE register is set as follows: 
PSTATE.mm < 00, // TSO 
PSTATE.pef < 1 // if an FPU is present, it is enabled 
PSTATE.am < 0 // address masking is turned off 
PSTATE.priv <0 // entering hyperprivileged mode 
PSTATE.ie < 0 // interrupts are disabled 
PSTATE.cle < 0 // big-endian is default for hyperprivileged mode 
PSTATE.tle ^ is unchanged // (was unspecified in SPARC V9 specification) 
PSTATE.tct < 0 // trap on CTI disabled 


m The HPSTATE register is updated: 


HPSTATE.red < 1 // enter RED state 
HPSTATE.hpriv € 1 // enter hyperprivileged mode 


HPSTATE.ibe < 0 // disable instruction breakpoints 
HPSTATE.tlz <0 // disable trap level zero exceptions 





m Fora register-window trap only, CWP is set to point to the register window that 
must be accessed by the trap-handler software, that is: 


If TT[TL] = 02446 // a clean window trap 
then CWP — CWP +1 
endif 


If (08016 € TT[TL] < OBF36) // window spill trap 
then CWP < CWP + CANSAVE + 2 
endif 
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If (0C016 € TT[TL] < OFF;g) // window fill trap 
then CWP + CWP-1 
endif 


For non-register-window traps, CWP is not changed. 


m Implementation-specific state changes; for example, disabling an MMU. 





m Control is transferred into the RED. state trap table. See Trap Table Entry Address 
to RED state on page 471 for further details of RSTVADDR. 
PC < RSTVADDRÍ63:8) :: 1010 00005 
NPC < RSTVADDRÍ63:8) :: 1010 01005 


12.6.2[.2 Power-On Reset (POR) Traps 


A POR trap occurs when power is applied to the virtual processor. If the virtual 
processor is in error. state, a POR brings the virtual processor out of 

error state and places it in RED. state. See Chapter 16, Resets for further 
details. 





Virtual processor state is undefined after POR, except for the following: 


m The trap level is set. 


TL < MAXTL 

m The trap type is set. 
TT[TL] < 00146 

m The Global Level register is updated. 
GL < MAXGL 


m The PSTATE register is set as follows: 
PSTATE.mm < 00, // TSO 
PSTATE.pef «— 1 // if an FPU is present, it is enabled 
PSTATE.am <0 // address masking is turned off 
PSTATE.priv <0 // entering hyperprivileged mode 
PSTATE.ie < 0 // interrupts are disabled 
PSTATE.cle < 0 // big-endian is default for hyperprivileged mode 
PSTATE.tle < 0 // big-endian mode for traps 
PSTATE.tct < 0 // trap on CTI disabled 


m The HPSTATE register is updated: 


HPSTATE.red < 1 // enter RED state 

HPSTATE.hpriv < 1 // enter hyperprivileged mode 
HPSTATE.ibe < 0 // disable instruction breakpoints 
HPSTATE.tlz <0 // disable trap level zero exceptions 





m The TICK register is protected. 
TICK.npt «— 1 // TICK is unreadable by nonprivileged software 


m Implementation-specific state changes; for example, disabling an MMU. 
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m Control is transferred into the RED, state trap table. 


PC < RSTVADDR{63:8} :: 0010 00005 
NPC < RSTVADDR{63:8} :: 0010 01005 


12.6.2.3 Watchdog Reset (WDR) Traps 


Entry to error state is caused by occurrence of a trap when TL - MAXTL (impl. 
dep. #39-V8-Cs10). See error state on page 460. 


To recover from error state, the UltraSPARC Architecture provides 
watchdog reset (WDR), which causes a transition from error state to 


RI 





ED_state (impl. dep. #254-U3-Cs10). 


The following virtual processor state changes occur during WDR (conceptually, in 
this order): 


m The trap level is updated. 


TL < min (TL + 1, MAXTL) 


Existing state is preserved. 


TSTATE[TL].g! <— GL 

TSTATE[TL].ccr + CCR 

TSTATE[TL].asi + ASI 

TSTATE[TL].pstate — PSTATE 

TSTATE[TL].cwp <— CWP 

TPC[TL] — PC // (upper 32 bits zeroed if PSTATE.am = 1) 
TNPCITL] < NPC // (upper 32 bits zeroed if PSTATE.am = 1) 
HTSTATE[TL].hpstate < HPSTATE 


The trap type is set. 


TT[TL] < the trap type that caused the WDR 


The Global Level register is updated. 


GL < min (GL + 1, MAXGL) 


The PSTATE register is set as follows: 


PSTATE.mm < 00, // TSO 

PSTATE.pef «— 1 // if an FPU is present, it is enabled 

PSTATE.am < 0 // address masking is turned off 

PSTATE.priv <0 // entering hyperprivileged mode 

PSTATE.ie < 0 // interrupts are disabled 

PSTATE.cle +0 // big-endian is default for hyperprivileged mode 
PSTATE.tle ^ is unchanged // (was unspecified in SPARC V9 specification) 
PSTATE.tct < 0 // trap on CTI disabled 


The HPSTATE register is updated: 
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HPSTATE.red < 1 // enter RED. state 

HPSTATE.hpriv <— 1 // enter hyperprivileged mode 
HPSTATE.ibe < 0 // disable instruction breakpoints 
HPSTATE.tlz <0 // disable trap level zero exceptions 


m Implementation-specific state changes; for example, disabling an MMU. 





m Control is transferred into the RED, state trap table. 
PC < RSTVADDRÍ63:8) :: 0100 00005 
NPC < RSTVADDRÍ63:8) :: 0100 01005 


12.6.2.4 Externally Initiated Reset (XIR) Traps 


XIR traps are initiated by an external signal. They behave like an interrupt that 
cannot be masked by PSTATE.ie = 0 or PIL. Typically, XIR is used for critical system 
events such as power failure, reset button pressed, failure of external components 
that does not require a WDR (which aborts operations), or systemwide reset in a 
multiprocessor. See Chapter 16, Resets for further details. 


If TL = MAXTL, then the virtual processor enters error_state. The following 
virtual processor state changes occur during XIR (conceptually, in this order): 


m The trap level is updated: 
TL < min (TL + 1, MAXTL) 


m Existing state is preserved. 
TSTATE[TL].gl <— GL 
TSTATE[TL].ccr + CCR 
TSTATE[TL].asi < ASI 
TSTATE[TL].pstate — PSTATE 
TSTATE[TL].cwp <— CWP 
TPC[TL] — PC // (upper 32 bits zeroed if PSTATE.am = 1) 
TNPCITL] < NPC // (upper 32 bits zeroed if PSTATE.am = 1) 
HTSTATE[TL].hpstate < HPSTATE 


m The trap type is set. 


TT[TL] < 00316 
m The Global Level register is updated. 
GL < min (GL + 1, MAXGL) 


m The PSTATE register is set as follows: 
PSTATE.mm < 00, // TSO 
PSTATE.pef «— 1 // if an FPU is present, it is enabled 
PSTATE.am <0 // address masking is turned off 
PSTATE.priv < 0 // entering hyperprivileged mode 
PSTATE.ie < 0 // interrupts are disabled 
PSTATE.cle < 0 // big-endian is default for hyperprivileged mode 
PSTATE le ^ is unchanged // (was unspecified in SPARC V9 specification) 
PSTATE.tct < 0 // trap on CTI disabled 
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m The HPSTATE register is updated: 
HPSTATE.red < 1 // enter RED. state 
HPSTATE.hpriv € 1 // enter hyperprivileged mode 
HPSTATE.ibe < 0 // disable instruction breakpoints 
HPSTATE.tlz <0 // disable trap level zero exceptions 





m Implementation-specific state changes; for example, disabling an MMU. 





m Control is transferred into the RED. state trap table. 
PC < RSTVADDR{63:8} :: 0110 0000, 
NPC < RSTVADDR{63:8} :: 0110 0100; 


See Externally Initiated Reset (XIR) on page 565 and the documentation for specific 
processor implementations for more information. 


12.6.2.5 Software-Initiated Reset (SIR) Traps 


A software-initiated reset trap is initiated by execution of an SIR instruction in 
hyperprivileged mode. Hyperprivileged software uses the SIR trap as a panic 
operation or a metahypervisor trap. See Chapter 16, Resets for further details. 


If TL = MAXTL, then the virtual processor enters error state. 


Otherwise, TL < MAXTL as trap processing begins and the following virtual processor 
state changes occur (conceptually, in this order): 


m The trap level is updated. 
TL — TL «1 


m Existing state is preserved. 
TSTATE[TL].gl < GL 
TSTATE[TL].ccr < CCR 
TSTATE[TL].asi + ASI 
TSTATE[TL].pstate — PSTATE 
TSTATE[TL].cwp <— CWP 
TPC[TL] «— PC  // (upper 32 bits zeroed if PSTATE.am = 1) 
TNPC[TL] < NPC // (upper 32 bits zeroed if PSTATE.am = 1) 
HTSTATE[TL].hpstate «— HPSTATE 


m The trap type is set. 


TT[TL] < 0446 
m The Global Level register is updated. 
GL < min (GL + 1, MAXGL) 


m The PSTATE register is set as follows: 
PSTATE.mm < 00, // TSO 
PSTATE.pef «— 1 // if an FPU is present, it is enabled 
PSTATE.am < 0 // address masking is turned off 
PSTATE.priv <0 // entering hyperprivileged mode 
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PSTATE.ie < 0 // interrupts are disabled 

PSTATE. cle +0 // big-endian is default for hyperprivileged mode 
PSTATE le ^ is unchanged // (was unspecified in SPARC V9 specification) 
PSTATE.tct < 0 // trap on CTI disabled 


m The HPSTATE register is updated: 


HPSTATE.red < 1 // enter RED. state 

HPSTATE.hpriv <— 1 // enter hyperprivileged mode 
HPSTATE.ibe < 0 // disable instruction breakpoints 
HPSTATE.tlz <0 // disable trap level zero exceptions 





m Implementation-specific state changes; for example, disabling an MMU. 





m Control is transferred into the RED. state trap table. 
PC < RSTVADDR({63:8} :: 1000 00005 
NPC < RSTVADDRÍ63:8) :: 1000 01005 


See Software-Initiated Reset (SIR) on page 566 and the documentation for specific 
processor implementations for more information. 


12.6.2.6 Nonreset Traps When the Virtual Processor Is in 
RED state 





When a nonreset trap occurs while the virtual processor is in RED, state, if 
TL = MAXTL, then the virtual processor enters error state. 


Otherwise, TL « MAXTL as trap processing begins, the virtual processor remains in 
RED. state, and the following virtual processor state changes occur (conceptually, 
in this order): 





m The trap level is updated. 
TL — TL «1 
m Existing state is preserved. 
TSTATE[TL].gl < GL 
TSTATE[TL].ccr + CCR 
TSTATE[TL].ASI < ASI 
TSTATE[TL].pstate — PSTATE 
TSTATE[TL].cwp <— CWP 
TPC[TL] <PC // (upper 32 bits zeroed if PSTATE.am = 1) 
TNPC[TL] < NPC // (upper 32 bits zeroed if PSTATE.am = 1) 
HTSTATE[TL].hpstate «c HPSTATE 
m The trap type is preserved. 
TT[TL] < trap type 
m The Global Level register is updated. 
GL < min (GL + 1, MAXGL) 
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m The PSTATE register is set as follows: 
PSTATE.mm < 00, // TSO 
PSTATE.pef «— 1 // if an FPU is present, it is enabled 
PSTATE.am < 0 // address masking is turned off 
PSTATE.priv <0 // entering hyperprivileged mode 
PSTATE.ie < 0 // interrupts are disabled 
PSTATE.cle + 0 // big-endian is default for hyperprivileged mode 
PSTATE.tle < unchanged // (was unspecified in SPARC V9 specification) 
PSTATE.tct < 0 // trap on CTI disabled 


m The HPSTATE register is updated: 
HPSTATE.red < 1 // enter RED state 
HPSTATE.hpriv € 1 // enter hyperprivileged mode 
HPSTATE.ibe < 0 // disable instruction breakpoints 
HPSTATE.tlz is unchanged 


m Fora register-window trap only, CWP is set to point to the register window that 
must be accessed by the trap-handler software, that is: 

If TT[TL] = 02446 // a clean window trap 
then CWP < CWP +1 
endif 
If (08016 < TT[TL] € OBF36) // window spill trap 
then CWP < CWP + CANSAVE +2 
endif 
If (0C016 < TT[TL] < OFF16) // window fill trap 
then CWP < CWP -1 
endif 


m For non-register-window traps, CWP is not changed. 





m Implementation-specific state changes; for example, disabling an MMU. 





m Control is transferred into the RED, state trap table. 
PC < RSTVADDR{63:8} :: 1010 00005 
NPC < RSTVADDR{63:8} :: 1010 01005 





12.7 | Exception and Interrupt Descriptions 


The following sections describe the various exceptions and interrupt requests and 
the conditions that cause them. Each exception and interrupt request describes the 
corresponding trap type as defined by the trap model. 
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All other trap types are reserved. 


Note | The encoding of trap types in the UltraSPARC Architecture 
differs from that shown in The SPARC Architecture Manual- 
Version 9. Each trap is marked as precise, deferred, disrupting, or 
reset. Example exception conditions are included for each 
exception type. Chapter 7, Instructions, enumerates which traps 
can be generated by each instruction. 


The following traps are generally expected to be supported in all UltraSPARC 
Architecture 2005 implementations. A given trap is not required to be supported in 
an implementation in which the conditions that cause the trap can never occur. 


m clean window [TT = 02416-02746] (Precise) — A SAVE instruction discovered 
that the window about to be used contains data from another address space; the 
window must be cleaned before it can be used. 


IMPL. DEP. #102-V9: An implementation may choose either to implement 
automatic cleaning of register windows in hardware or to generate a 

clean window trap, when needed, so that window(s) can be cleaned by software. 
If an implementation chooses the latter option, then support for this trap type is 
mandatory. 


a cpu mondo [TT = 07C;4] (Disrupting) — This interrupt is generated when 
another virtual processor has enqueued a message for this virtual processor. It is 
used to deliver a trap in privileged mode, to inform privileged software that an 
interrupt report has been appended to the virtual processor's CPU mondo queue. 
A direct message between virtual processors is sent via a CPU mondo interrupt, 
which is generated through software calls to hyperprivileged software. The 
standard software interface (API) to hyperprivileged software allows 64 bytes of 
data to be sent to one or more target virtual processors. When the CPU mondo 
queue has a valid entry, a cpu mondo exception is sent to the target virtual 
processor. 


m data access error [TT = 032,6] (Precise) — A hardware error occurred during a 
data access. 


m data access exception [TT = 03045] (Precise) — An exception occurred on an 
attempted data access. Detailed information regarding the error is logged into the 
ft field of the DSFSR (Data Synchronous Fault Status register, ASI 5816, 

VA = 1849). 

The conditions that may cause a dala access exception exception are: 

» Privilege Violation — An attempt to access a privileged page (TTE.p = 1) by 
any type of load, store, or load-store instruction when executing in 
nonprivileged mode (PSTATE.priv = 0). This includes the special case of an 
access by privileged software using one of the 
ASI AS IF USER PRIMARY[ LITTLE]or 
ASI AS IF USER SECONDARY[ LITTLE] ASls. 
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a Illegal Access to Noncacheable Page — An access to a noncacheable page 
(TTE.cp = 0) (including cases with the TLB disabled) was attempted by an 
atomic load-store instruction (CASA, CASXA, SWAP, SWAPA, LDSTUB, or 
LDSTUBA) or an LDTXA instruction. 


Illegal Access to Page That May Cause Side Effects — An attempt was made 
to access a page which may cause side effects (TTE.e = 1) (including cases with 
the TLB disabled) by any type of load instruction with nonfaulting ASI. 


Invalid ASI — An attempt was made to execute an invalid combination of 
instruction and ASI. See the instruction descriptions in Chapter 7 for a detailed 
list of valid ASIs for each instruction that can access alternate address spaces. 
The following invalid combinations of instruction, ASI, and virtual address 
cause a data access exception exception: 


a A load, store, load-store, or PREFETCHA instruction with either an invalid 
ASI or an invalid virtual address for a valid ASI. 

a A disallowed combination of instruction and ASI (see Block Load and Store 
ASIs on page 443 and Partial Store ASIs on page 444). This includes the 
following: 

= An attempt to use a Load Twin Extended Word (LDTXA) ASI (see ASIs 1046, 


1116, 1616, 1716 and 1816 (AST *AS IF USER .*)on page 436) with any load 
alternate opcode other than LDTXA's (which is shared by LDTWA) 





= An attempt to use a nontranslating ASI value with any load or store alternate 
instruction other than LDXA, LDDFA, STXA, or STDFA 


a An attempt to read from a write-only ASl-accessible register 


a An attempt to write to a read-only ASl-accessible register 


Illegal Access to Non-Faulting-Only Page — An attempt was made to access a 
non-faulting-only page (TTE.nfo = 1) by any type of load, store, or load-store 
instruction with an ASI other than a nonfaulting ASI 

(PRIMARY, NO FAULT[ LITTLE]or SECONDARY, NO FAULT[ LITTLE]). 

















Forward | The next revision of the UltraSPARC Architecture is expected to 
Compatibility | replace data access exception with several more specific 
Note | exceptions — one for each condition that currently can cause a 
dala access exception. This will support slightly faster trap 
handling for these exceptions and allow elimination of the D- 
SFSR register. 


a data invalid TSB entry [TT = 02B,¢] (Precise) — During an attempted data 
access, 
the MMU detected that a translation lookaside buffer did not contain a 
translation for the virtual address, and 
the required TTE was found in the configured TSBs to be a real address, 
requiring real-to-physical address translation, and 
the real address cannot be translated to a physical address by hardware. 
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m data real translation miss [TT = 03F;4] (Precise) — During an attempted real 
address data access, the MMU detected that a translation lookaside buffer (TLB) 
did not contain a translation for the real address (that is, a TLB miss occurred). 


a dev mondo [TT = 07D4g] (Disrupting) — This interrupt causes a trap to be 
delivered in privileged mode, to inform privileged software that an interrupt 
report has been appended to its device mondo queue. When a virtual processor 
has appended a valid entry to a target virtual processor's device mondo queue, it 
sends a dev mondo exception to the target virtual processor. The interrupt report 
contents are device specific. 


m division by zero [TT = 02845] (Precise) — An integer divide instruction 
attempted to divide by zero. 


m externally initiated reset (XIR) [TT = 00316] (Reset) — An external signal was 
asserted. This trap is used for catastrophic events such as power failure, reset 
button pressed, and system-wide reset in multiprocessor systems. 


m fast data access MMU miss [TT = 06846] (Precise) — During an attempted 
data access to memory, the MMU detected that a translation lookaside buffer did 
not contain a translation for the virtual address. 

Four trap vectors are allocated for this trap, allowing a TLB miss handler of up to 
32 instructions to fit within the trap vector area. 


m fast data access protection [TT = 06C:,] (Precise) — During an attempted 
data write access (by a store or load-store instruction), the instruction had 
appropriate access privilege but the MMU signalled that the location was write- 
protected (write to a read-only location (TTE.w = 0)). Four trap vectors are 
allocated for this trap, allowing a trap handler of up to 32 instructions to fit 
within the trap vector area. 


Note that on an UltraSPARC Architecture virtual processor, an attempt to read or 
write to a privileged location while in nonprivileged mode causes the higher- 
priority instead of this exception. 


m fast instruction access MMU miss [TT = 06446] (Precise) — During an 
attempted instruction virtual address access, the MMU detected a TLB miss. 
Four trap vectors are allocated for this trap, allowing a trap handler of up to 32 
instructions to fit within the trap vector area. 


m fill » normal [TT =0C0;,-0DF;4] (Precise) 

m fill » other [TT = 0E0;:,-0FF;6] (Precise) 

A RESTORE or RETURN instruction has determined that the contents of a 
register window must be restored from memory. 

m fp disabled [TT = 02046] (Precise) — An attempt was made to execute an FPop, a 
floating-point branch, or a floating-point load /store instruction while an FPU was 
disabled (PSTATE.pef = 0 or FPRS.fef = 0). 

m fp exception ieee 754 [TT = 02146] (Precise) — An FPop instruction generated 
an IEEE 754 exception and its corresponding trap enable mask (FSR.tem) bit was 
1. The floating-point exception type, IEEE 754 exception, is encoded in the 
FSRftt, and specific IEEE 754 exception information is encoded in FSR.cexc. 
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m fp exception other [TT = 02246] (Precise) — An FPop instruction generated an 
exception other than an IEEE 754 exception. Examples: the FPop is 
unimplemented or execution of an FPop requires software assistance to complete. 
The floating-point exception type is encoded in FSR.ftt. 


m guest watchdog [TT - (see text)] (Precise, Disrupting) — The virtual processor 
was in nonprivileged or privileged mode, TL was 2 MAXPTL, and a precise or 
disrupting exception to privileged mode occurred. guest watchdog uses the same 
trap table entry (table offset 040415) as watchdog reset. When a guest watchdog 
trap occurs, the trap type (TT) value and priority of the exception that caused the 
trap are retained. 


m hstick match [TT = 05E;g] (Disrupting) —This interrupt indicates that a match 
between the System Tick (STICK) and the Hypervisor System Tick Compare 
(HSTICK CMPR) register has occurred (or that software has set HINTP.hsp - 1). 
The event is recorded in the hstick match pending (hsp) bit of the Hypervisor 
Interrupt Pending (HINTP) register. The hstick match disrupting trap is 
recognized when HINTP.hsp = 1 and (PSTATE.ie = 1 or HPSTATE.hpriv = 0); 
otherwise, it remains pending. HINTP.hsp provides a mechanism for 
hyperprivileged software to determine that an hstick match trap is pending while 
PSTATE.ie = 0 and to clear the condition without actually having to take the 
hstick match trap. 


m htrap instruction [TT = 180:,-1FF:6] (Precise) — A Tcc instruction was executed 
in privileged or hyperprivileged mode, the trap condition evaluated to TRUE, and 
the software trap number was greater than 127. The trap is delivered in 
hyperprivileged mode, using the hyperprivileged mode trap base address 
(HTBA). See also trap instruction on page 505. 





m illegal instruction [TT = 01046] (Precise) — An attempt was made to execute an 
ILLTRAP instruction, an instruction with an unimplemented opcode, an 
instruction with invalid field usage, or an instruction that would result in illegal 
processor state. 


Note | An unimplemented FPop instruction generates an 
fp exception other exception with ftt = 3, instead of an 
illegal instruction exception. 


Examples of cases in which illegal instruction is generated include the following: 


a An instruction encoding does not match any of the opcode map definitions (see 
Appendix A, Opcode Maps). 


a A non-FPop instruction is not implemented in hardware. 
m A reserved instruction field in Tec instruction is nonzero. 


If a reserved instruction field in an instruction other than Tcc is nonzero, an 
illegal instruction exception should be, but is not required to be, generated. 
(See Reserved Opcodes and Instruction Fields on page 134.) 


a An illegal value is present in an instruction i field. 
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a An illegal value is present in a field that is explicitly defined for an instruction, 
such as cc2, cc1, cc0, fcn, impl, op2 (IMPDEP2A, IMPDEP2B), rcond, or opf cc. 


a Illegal register alignment (such as odd rd value in a doubleword load 
instruction). 


a Illegal rd value for LDXFSR, STXFSR, or the deprecated instructions LDFSR or 
STFSR. 


a ILLTRAP instruction. 
a DONE or RETRY when TL - 0. 


All causes of an illegal instruction exception are described in individual 
instruction descriptions in Chapter 7, Instructions. 


m instruction access error [TT = 00A46] (Precise) — A hardware error occurred 
during an instruction access. 


m instruction access exception [TT = 00816] (Precise) — An exception occurred 
on an instruction access. The conditions that may cause an 
instruction access exception exception are: 

a Privilege Violation — An attempt to fetch an instruction from a privileged 
memory page (TTE.p = 1) while the virtual processor was executing in 
nonprivileged mode. 

» Unauthorized Access — An attempt to fetch an instruction from a memory 
page which was missing "execute" permission (TTE.ep - 0). 

» No-Fault Only Access — An attempt to fetch an instruction from a memory 
page which was marked for access only by nonfaulting loads (TTE.nfo = 1). 


wm instruction breakpoint [TT = 076,¢] (Precise) — This exception is generated if 
HPSTATE.ibe - 1 and the processor has detected a breakpoint condition based on 
the values in the Instruction Breakpoint Control register for the current 
instruction. As part of the trap, the HPSTATE.ibe bit is cleared (set to 0). 


a instruction invalid TSB entry [TT = 02A16] (Precise) — During an attempted 
instruction access (instruction fetch), 
the MMU detected that a translation lookaside buffer did not contain a translation 
for the virtual address, 
the required TTE was found in the configured TSBs to be a real address, requiring 
real-to-physical address translation, and 
the real address cannot be translated to a physical address by hardware. 


m instruction real translation miss [TT = 03E;4] (Precise) — During an 
attempted real address instruction access (instruction fetch), the MMU detected a 
TLB miss. 


m internal processor error [TT = 029;;] (Precise) — A serious internal error 
occurred in the virtual processor. 


IMPL. DEP. #402-S10: The trap priority of the internal processor error exception 
is implementation dependent. Furthermore, its priority may vary within an 
implementation, based on the cause of the error being reported. 
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interrupt level n [TT = 041;5-04F;6] (Disrupting) — SOFTINT{n} was set to 1 or 
an external interrupt request of level n was presented to the virtual processor and 
n » PIL. 
Implementation | interrupt level 14 can be caused by (1) setting SOFTINT(14] 
Note | to 1, (2) occurrence of a "TICK match", or (3) occurrence of a 
"STICK match" (see SOFTINT" Register (ASRs 20, 21, 22) on 
page 81). 


LDDF mem address not aligned [TT = 03546] (Precise) — An attempt was 
made to execute an LDDF or LDDFA instruction and the effective address was not 
doubleword aligned. (impl. dep. #109) 


mem address not aligned [TT = 034,6] (Precise) — A load/store instruction 
generated a memory address that was not properly aligned according to the 
instruction, or a JMPL or RETURN instruction generated a non-word-aligned 
address. (See also Special Memory Access ASIs on page 436.) 


nonresumable error [TT = 07F46] (Disrupting) — There is a valid entry in the 
nonresumable error queue. This interrupt is not generated by hardware, but is 
used by hyperprivileged software to inform privileged software that an error 
report has been appended to the nonresumable error queue. 


PA watchpoint [TT = 061;4] (Precise) — The virtual processor has detected a load 
or store to a physical address specified by the PA Watchpoint register while PA 
watchpoints are enabled. Hyperprivileged software may reflect this trap back to 
privileged software as a synthetic HA watchpoint exception. 


pic overflow [TT = 04F;;] (Disrupting) — A performance counter has overflowed 
and PIL < 15. Note that this exception shares a trap type, 04F1¢, with 

interrupt level 15. The disrupting trap caused by pic overflow is conditioned by 
PSTATE.ie. 

If PSTATE.ie = 1 and PIL « 15 when the possible counter overflow is detected and 
depending on the event being monitored by the counter, the disrupting trap may 
be reported prior to retirement of the instruction that incremented the counter to 
cause the possible counter overflow. Upon entry to the trap handler, TPC points 
to an instruction that increments the performance counter and the counter is 
within some epsilon of overflow. 

If PSTATE.ie = 0 or PIL = 15 when the possible overflow is detected, the trap 
remains pending and will be taken on the first instruction for which 

PSTATE.ie = 1 and PIL « 15. In this case, TPC may not point to an instruction that 
increments the counter. 


power on reset (POR) [TT = 00146] (Reset) — An external signal was asserted. 
This trap is issued to bring a system reliably from the power-off to the power-on 
state. 


privileged action [TT = 03746] (Precise) — An action defined to be privileged has 
been attempted while in nonprivileged mode (PSTATE.priv = 0 and 
HPSTATE.hpriv = 0), or an action defined to be hyperprivileged has been 
attempted while in nonprivileged or privileged mode (HPSTATE.hpriv = 0). 
Examples: 
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a A data access by nonprivileged software using a restricted (privileged or 
hyperprivileged) ASI, that is, an ASI in the range 0046 to 7Fy¢ (inclusively) 

= A data access by nonprivileged or privileged software using a hyperprivileged 
ASI, that is, an ASI in the range 3046 to 7F16 (inclusively) 

a Execution by nonprivileged software of an instruction with a privileged 
operand value 

w An attempt to read the TICK register by nonprivileged software when 
nonprivileged access to TICK is disabled (TICK.npt = 1). 

a An attempt to access the PIC register (using RDPIC or WRPIC) while in 
nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0) and 
nonprivileged access to PIC is disallowed (PCR.priv = 1). 

a An attempt to execute a nonprivileged instruction with an operand value 
requiring more privilege than available in the current privilege mode. 


m privileged opcode [TT = 01146] (Precise) — An attempt was made to execute a 
privileged instruction while PSTATE.priv - 0. 


m HED state exception [TT = (see text)] (Precise) — Caused when TL = MAXTL — 1 
and a trap occurs, an event that brings the virtual processor into RED, state. 
Uses the trap vector entry reserved for trap type 00546, but the trap type recorded 
in TT is the trap type of the original exception that triggered 
RED state exception. 





a resumable error [TT = 07E46] (Disrupting) — There is a valid entry in the 
resumable error queue. This interrupt is used to inform privileged software that 
an error report has been appended to the resumable error queue, and the current 
instruction stream is in a consistent state so that execution can be resumed after 
the error is handled. 


m Software initiated reset (SIR) [TT = 004465] (Precise) — Caused by the execution 
of the SIR instruction. It allows system software to reset the virtual processor. 


m Spill 1 normal [TT = 08016-09F16] (Precise) 
m Spill n other [TT =0A0:,-0BF;6] (Precise) 


A SAVE or FLUSHW instruction has determined that the contents of a register 
window must be saved to memory. 


m STDF mem address not aligned [TT = 03646] (Precise) — An attempt was 
made to execute an STDF or STDFA instruction and the effective address was not 
doubleword aligned. (impl. dep. #110) 


m Store error [TT = 00716] (Deferred) — An error has been detected on a store 
instruction that prevents it from completing, but the error was detected after the 
store had passed its instruction retirement point. Since the store cannot be made 
globally visible, the software thread that issued the store must be terminated. 
Therefore, this is a termination deferred trap. 


IMPL. DEP. #218-U3-Cs20: Whether async dala error exception is implemented 
is implementation dependent. If it does exist, it indicates that an error is detected 
in a processor core and its trap type is 4016. 
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m lag overflow [TT = 02346] (Precise) (deprecated (c2) ) — A TADDccTV or 
TSUBccTV instruction was executed, and either 32-bit arithmetic overflow 
occurred or at least one of the tag bits of the operands was nonzero. 

m trap instruction [TT = 10016-17F16] (Precise) — A Tee instruction was executed 
and the trap condition evaluated to TRUE, and the software trap number operand 
of the instruction is 127 or less. 





m trap level zero [TT = 05F;;] (Precise) — This exception indicates a simultaneous 
existence of three conditions as an instruction is about to be executed: 
a trap level zero exceptions are enabled (HPSTATE.tlz = 1), 
a the virtual processor is in nonprivileged or privileged mode 
(HPSTATE.hpriv = 0), and 
a the trap level (TL) register's value is zero (TL = 0) 


Upon entry to the trap handler for trap level zero, TPC points to the instruction 
that was about to be executed after all three of these conditions were met. 


Programming | The purpose of this trap is to improve efficiency when de- 

Note | scheduling a virtual processor. When a descheduling event 
occurs and the virtual processor is executing in privileged mode 
at TL » 0, hyperprivileged software can choose to enable the 
trap level zero exception (set HPSTATE.tlz «— 1) and return to 
privileged mode, enabling privileged software to complete its 
TL > 0 processing. When privileged code returns to TL = 0, this 
exception enables the hyperprivileged code to regain control 
and deschedule the virtual processor with low overhead. 





m unimplemented LDTW [TT = 01246] (Precise) — An attempt was made to execute 
an LDTW instruction that is not implemented in hardware on this 
implementation (impl. dep. #107-V9). 

m unimplemented STTW [TT = 013,¢] (Precise) — An attempt was made to execute 
an STTW instruction that is not implemented in hardware on this implementation 
(impl. dep. #108-V9). 

m Watchdog reset (WDR) [TT = 00245] (Reset) — This trap occurs in error state 
and causes a transition to RED, state (impl. dep. #254-U3-Cs10). 





m VA watchpoint [TT = 0626] (Precise) — The virtual processor has detected an 
attempt to access a virtual address specified by the VA Watchpoint register, while 
VA watchpoints are enabled and the address is being translated from a virtual 
address to a physical address. If the load or store address is not being translated 
from a virtual address (for example, the address is being treated as a real 
address), then a VA watchpoint exception will not be generated even if a match is 
detected between the VA Watchpoint register and a load or store address. This 
exception is always masked in hyperprivileged mode; therefore, a VA watchpoint 
trap cannot occur in hyperprivileged mode (even if memory is accessed using 
ASI AS IF USER PRIMARY or ASI AS IF USER SECONDARY). 
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12254 SPARC V? Traps Not Used in UltraSPARC 
Architecture 2005 


The following traps were optional in the SPARC V9 specification and are not used in 
UltraSPARC Architecture 2005: 





data access protection [TT = 03346] (Precise or Deferred) — This exception is 
generally superseded by fast data access protection (see page 500). 


IMPL. DEP. ££fast ECC error [TT = 070,6] (Precise) — A single-bit or multiple-bit 
ECC error was detected. 202-U3: Whether or not a fast ECC. error trap exists is 
implementation dependent. If it does exist, it indicates that an ECC error was 
detected in an external cache and its trap type is 07046- 


implementation dependent exception n [TT = 07746 - 07A;6] This range of 
implementation-dependent exceptions has been replaced by a set of 
architecturally-defined exceptions. (impl.dep. #35-V8-Cs20) 


LDQF mem address not aligned [TT = 03845] (Precise) — An attempt was 
made to execute an LDQF instruction and the effective address was word aligned 
but not quadword aligned. Use of this exception is implementation dependent 
(impl. dep. #111-V9-Cs10). A separate trap entry for this exception supports fast 
software emulation of the LDOF instruction when the effective address is word 
aligned but not quadword aligned. See Load Floating-Point Register on page 251. 
(impl. dep. #111) 

STQF mem address not aligned [TT = 03946] (Precise) — An attempt was 
made to execute an STOF instruction and the effective address was word aligned 
but not quadword aligned. Use of this exception is implementation dependent 
(impl. dep. #112-V9-Cs10). A separate trap entry for the exception supports fast 
software emulation of the STQF instruction when the effective address is word 
aligned but not quadword aligned. See Store Floating-Point on page 339. (impl. dep. 
#112) 


12.8 Register Window Traps 


Window traps are used to manage overflow and underflow conditions in the register 
windows, support clean windows, and implement the FLUSHW instruction. 
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12.8.1 


12.8.2 


12.8.3 


Window Spill and Fill Traps 


A window overflow occurs when a SAVE instruction is executed and the next 
register window is occupied (CANSAVE = 0). An overflow causes a spill trap that 
allows privileged software to save the occupied register window in memory, thereby 
making it available for use. 


A window underflow occurs when a RESTORE instruction is executed and the 
previous register window is not valid (CANRESTORE = 0). An underflow causes a 
fill trap that allows privileged software to load the registers from memory. 


clean window Trap 


The virtual processor provides the clean window trap so that system software can 
create a secure environment in which it is guaranteed that data cannot inadvertently 
leak through register windows from one software program to another. 


A clean register window is one in which all of the registers, including uninitialized 
registers, contain either 0 or data assigned by software executing in the address 
space to which the window belongs. A clean window cannot contain register values 
from another process, that is, from software operating in a different address space. 


Supervisor software specifies the number of windows that are clean with respect to 
the current address space in the CLEANWIN register. This number includes register 
windows that can be restored (the value in the CANRESTORE register) and the 
register windows following CWP that can be used without cleaning. Therefore, the 
number of clean windows available to be used by the SAVE instruction is 


CLEANWIN - CANRESTORE 


The SAVE instruction causes a clean window exception if this value is 0. This 
behavior allows supervisor software to clean a register window before it is accessed 
by a user. 


Vectoring of Fill/Spill Traps 


To make handling of fill and spill traps efficient, the SPARC V9 architecture provides 
multiple trap vectors for the fill and spill traps. These trap vectors are determined as 
follows: 


m Supervisor software can mark a set of contiguous register windows as belonging 
to an address space different from the current one. The count of these register 
windows is kept in the OTHERWIN register. A separate set of trap vectors 
(fill n other and spill n other) is provided for spill and fill traps for these register 
windows (as opposed to register windows that belong to the current address 
space). 
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m Supervisor software can specify the trap vectors for fill and spill traps by 
presetting the fields in the WSTATE register. This register contains two subfields, 
each three bits wide. The WSTATE.normal field determines one of eight spill (fill) 
vectors to be used when the register window to be spilled (filled) belongs to the 
current address space (OTHERWIN = 0). If the OTHERWIN register is nonzero, the 
WSTATE.other field selects one of eight fill n other (spill n other) trap vectors. 


See Trap-Table Entry Addresses on page 469, for more details on how the trap address 
is determined. 


12.8.4 CWP on Window Traps 


On a window trap, the CWP is set to point to the window that must be accessed by 
the trap handler, as follows. 


Note | All arithmetic on CWP is done modulo N REG WINDOWS. 


m Ifthe spill trap occurs because of a SAVE instruction (when CANSAVE = 0), there 
is an overlap window between the CWP and the next register window to be 
spilled: 

CWP < (CWP + 2) mod N REG WINDOWS 
If the spill trap occurs because of a FLUSHW instruction, there can be unused 


windows (CANSAVE) in addition to the overlap window between the CWP and 
the window to be spilled: 


CWP < (CWP + CANSAVE + 2) mod N_REG_WINDOWS 
Implementation | All spill traps can set CWP by using the calculation: 
Note | CWP < (CWP + CANSAVE + 2) mod N REG WINDOWS 
since CANSAVE is 0 whenever a trap occurs because of a SAVE 
instruction. 
m Ona fill trap, the window preceding CWP must be filled: 
CWP < (CWP - 1) mod N REG WINDOWS 
a Ona clean window trap, the window following CWP must be cleaned. Then 
CWP < (CWP + 1) mod N REG WINDOWS 


12.85 | Window Trap Handlers 


The trap handlers for fill, spill, and clean_window traps must handle the trap 
appropriately and return, by using the RETRY instruction, to reexecute the trapped 
instruction. The state of the register windows must be updated by the trap handler, 
and the relationships among CLEANWIN, CANSAVE, CANRESTORE, and 
OTHERWIN must remain consistent. Follow these recommendations: 
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m A spill trap handler should execute the SAVED instruction for each window that 
it spills. 

m À fill trap handler should execute the RESTORED instruction for each window 
that it fills. 


m Aclean window trap handler should increment CLEANWIN for each window that 
it cleans: 


CLEANWIN < (CLEANWIN + 1) 
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CHAPTER 1 3 


Interrupt Handling 





Virtual processors and I/O devices can interrupt a selected virtual processor by 
assembling and sending an interrupt packet. The contents of the interrupt packet are 
defined by software convention. Thus, hardware interrupts and cross-calls can have 
the same hardware mechanism for interrupt delivery and share a common software 
interface for processing. 


The interrupt mechanism is a two-step process: 


m sending of an interrupt request (through an implemenation-specific hardware 
mechanism) to an interrupt queue of the target virtual processor 


m receipt of the interrupt request on the target virtual processor and scheduling 
software handling of the interrupt request 


Privileged software running on a virtual processor can schedule interrupts to itself 
(typically, to process queued interrupts at a later time) by setting bits in the 
privileged SOFTINT register (see Software Interrupt Register (SOFTINT) on page 512). 


Programming | An interrupt request packet is sent by an interrupt source 

Note (through an implementation-specific mechanism) and is 
received by the specified target in an interrupt queue. Upon 
receipt of an interrupt request packet, a special trap is invoked 
on the target virtual processor. The trap handler software 
invoked in the target virtual processor then schedules itself to 
later handle the interrupt request by posting an interrupt in the 
SOFTINT register at the desired interrupt level. 





In the following sections, the following aspects of interrupt handling are described: 
m Interrupt Packets on page 512. 

m Software Interrupt Register (SOFTINT) on page 512. 

m Interrupt Queues on page 513. 

m Interrupt Traps on page 515. 

m Strand Interrupt ID Register (STRAND INTR ID) on page 516. 
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13.1 


Interrupt Packets 


Each interrupt is accompanied by data, referred to as an “interrupt packet”. An 
interrupt packet is 64 bytes long, consisting of eight 64-bit doublewords. The 
contents of these data are defined by software convention. 





192 


193251 


Software Interrupt Register (SOFTINT) 


To schedule interrupt vectors for processing at a later time, privileged software 
running on a virtual processor can send itself signals (interrupts) by setting bits in 
the privileged SOFTINT register. Similarly, hyperprivileged software can schedule 
interrupt vectors for privileged software running on the same virtual processorby 
setting bits in SOFTINT. 


See SOFTINT" Register (ASRs 20, 21, 22) on page 81 for a detailed description of the 
SOFTINT register. 


Programming | The SOFTINT register (ASR 1646) is used for communication 

Note | from nucleus (privileged, TL > 0) software to privileged software 
running with TL = 0. Interrupt packets and other service 
requests can be scheduled in queues or mailboxes in memory by 
the nucleus, which then sets SOFTINT{n} to cause an interrupt 
at level n. 


Programming | The SOFTINT mechanism is independent of the “mondo” 
Note | interrupt mechanism mentioned in Interrupt Queues on page 513. 
The two mechanisms do not interact. 





Setting the Software Interrupt Register 


SOFTINT [r1] is set to 1 by executing a WRSOFTINT. SETP instruction (WRasr using 
ASR 20) with a ‘1’ in bit n of the value written (bit n corresponds to interrupt level 
n). The value written to the SOFTINT SET register is effectively ored into the 
SOFTINT register. This approach allows the interrupt handler to set one or more 
bits in the SOFTINT register with a single instruction. 


See SOFTINT. SET? Pseudo-Register (ASR 20) on page 82 for a detailed description of 
the SOFTINT SET pseudo-register. 
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13.22 


Clearing the Software Interrupt Register 


When all interrupts scheduled for service at level n have been serviced, kernel 
software executes a WRSOFTINT_CLR? instruction (WRasr using ASR 21) with a '1' 
in bit n of the value written, to clear interrupt level n (impl. dep. 34-V8a). The 
complement of the value written to the SOFTINT CLR register is effectively anded 
with the SOFTINT register. This approach allows the interrupt handler to clear one 
or more bits in the SOFTINT register with a single instruction. 


Programming | To avoid a race condition between operating system kernel 
Note | software clearing an interrupt bit and nucleus software setting 
it, software should (again) examine the queue for any valid 
entries after clearing the interrupt bit. 


See SOFTINT_CLR? Pseudo-Register (ASR 21) on page 83 for a detailed description of 
the SOFTINT CLR pseudo-register. 





13.3 


13.3.1 


Interrupt Queues 


Interrupts are indicated to privileged mode via circular interrupt queues, each with 
an associated trap vector. There are 4 interrupt queues, one for each of the following 
types of interrupts: 


= Device mondos! 

m CPU mondos 

m Resumable errors 

m Nonresumable errors 

New interrupt entries are appended to the tail of a queue (by hardware or by 


hyperprivileged software) and privileged software reads them from the head of the 
queue. 


Programming | Software conventions for cooperative management of interrupt 
Note | queues and the format of queue entries are specified in the 
separate Hypervisor API Specification document. 


Interrupt Queue Registers 


The active contents of each queue are delineated by a 64-bit head register and a 64- 
bit tail register. 


1- “mondo” is a historical term, referring to the name of the original UltraSPARC 1 bus transaction in which 


these interrupts were introduced 
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IMPL. DEP. #421-S10: It is implementation dependent whether interrupt queue 
head and tail registers (a) are datatype-agnostic "scratch registers" used for 
communication between privileged and hyperprivileged software, in which case 
their contents are defined purely by software convention, or (b) are maintained to 
some degree by virtual processor hardware, imposing a fixed meaning on their 
contents. 


Programming | If the contents of Queue Head and Tail registers are set only by 

Note | software convention in a given implementation, software could 
place any type of data in them (such as addresses, address 
offsets, or index values). 


It is expected that Queue Head and Tail registers will typically 
contain a byte offset from the base of an appropriately-aligned 
queue region in memory. 














The interrupt queue registers are accessed through ASI ASI QUEUE (25,9). The ASI 
and address assignments for the interrupt queue registers are provided in TABLE 13-1. 





TABLE 13-1 Interrupt Queue Register ASI Assignments 


"T Hyper- 
Virtual Privileged privileged 

ASI mode 
Address mode 

r Access 
Register Access 
CPU Mondo Queue Head 2516 (ASI_QUEUE) 3C016 RW R/W 
CPU Mondo Queue Tail 2516 (ASI_QUEUE) 3C8;, RorRWt R/W 
Device Mondo Queue Head 2516 (ASI_ QUEUE) 3D016 RW R/W 
Device Mondo Queue Tail 2516 (ASI_ QUEUE)  3D816 RorRWt R/W 
Resumable Error Queue Head 2516 (ASI_QUEUE)  3E04 RW R/W 
Resumable Error Queue Tail 2516 (ASI. QUEUE) 3E8;, RorRWt R/W 
Nonresumable Error Queue Head 2546 (ASI QUEUE) 3F016 RW R/W 
Nonresumable Error Queue Tail 2516 (ASI. QUEUE) 3F8 RorRWt R/W 























+ seelMPL. DEP.#422-S10 


IMPL. DEP. #422-S10: It is implementation dependent whether tail registers are 
writable in privileged mode. If a tail register is read-only in privileged mode, an 
attempt to write to it causes a data access exception exception. If a tail register is 
writable in privileged mode, an attempt to write to it results in undefined behavior. 


Implementation | Although Queue Head and Tail registers behave as registers, 
Note | they may or may not be implemented using actual hardware 
registers. For example, they may reside in memory, mapped by 
a mechanism visible only to hyperprivileged software. In any 
case, the means by which Queue Head and Tail registers are 
implemented is not visible to privileged software. 
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The status of each queue is reflected by its head and tail registers: 


m À Queue Head Register indicates the location of the oldest interrupt packet in the 
queue 


m A Queue Tail Register indicates the location where the next interrupt packet will 
be stored 


An event that results in the insertion of a queue entry causes the tail register for that 
queue to refer to the following entry in the circular queue. Privileged code is 
responsible for updating the head register appropriately when it removes an entry 
from the queue. 


A queue is empty when the contents of its head and tail registers are equal. A queue 
is full when the insertion of one more entry would cause the contents of its head and 
tail registers to become equal. 


Programming | By current convention, the format of a Queue Head or Tail 
Note | register is as follows: 


head/tail offset 000000 
63 6 5 0 
Under this convention: 


m updating a Queue Head register involves incrementing it by 
64 (size of a queue entry, in bytes) 


m Queue Head and Tail registers are updated using modular 
arithmetic (modulo the size of the circular queue, in bytes) 


m bits 5:0 always read as zeros, and attempts to write to them are 
ignored 


m the maximum queue offset for an interrupt queue is 
implementation dependent 


m behavior when a queue register is written with a value larger 
than the maximum queue offset (queue length minus the 
length of the last entry) is undefined 





This is merely a convention and is subject to change. 





13.4 


Interrupt Traps 


The following interrupt traps are defined in the UltraSPARC Architecture 2005: 
cpu mondo, dev mondo, resumable error, and nonresumable error. The first three 
(cpu mondo, dev mondo, and resumable error) are all generated by hardware, 
while nonresumable error is generated by hyperprivileged software. See 

Chapter 12, Traps, for details. 
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UltraSPARC Architecture 2005 also supports the interrupt level n traps defined in 
the SPARC V9 specification. 


How interrupts are delivered is implementation-specific; see the relevant 
implementation-specific Supplement to this specification for details. 





13.5 Strand Interrupt ID Register 
(STRAND INTR ID) 


The STRAND INTR D per-virtual-processor register allows software to assign a 16- 
bit interrupt ID to a virtual processor that is unique within the system. This is 
important, to enable virtual processors to receive interrupts. See Strand Interrupt ID 
Register (STRAND INTR D) on page 538 for details. 
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CHAPTER 1 4 


Memory Management 





An UltraSPARC Architecture Memory Management Unit (MMU) conforms to the 
requirements set forth in the SPARC V9 Architecture Manual. In particular, it supports 
a 64-bit virtual address space, simplified protection encoding, and multiple page 
sizes. 


In UltraSPARC Architecture 2005, memory management is implementation-specific. 
Basic concepts are described in this chapter, but see the relevant processor-specific 
Supplement to this specification for a detailed description of a particular processor's 
memory management facilities. 


This appendix describes the Memory Management Unit, as observed by 
hyperprivileged software, in these sections: 


Virtual Address Translation on page 517. 
TSB Translation Table Entry (TTE) on page 520. 
Translation Storage Buffer (TSB) on page 524. 


L| 
L| 
a 
m Faults and Traps on page 525. 





14.1 


Virtual Address Translation 


The MMUs may support up to four page sizes: 8 KBytes, 64 KBytes, 4 MBytes, and 
256 MBytes 8-KByte, 64-KByte and 4- MByte page sizes must be supported; other 
page sizes are optional. 


Each MMU consists of one or more Translation Lookaside Buffers (TLBs), and may 
include micro-TLB structures. Separate Instruction and Data MMUs (IMMU and 
DMMU, respectively) may be provided to enable concurrent virtual-to-physical 
address translations for instruction and data. 


IMPL. DEP. #222-U3: TLB organization is implementation dependent. 


Privileged software manages virtual-to-real address translations. Hyperprivileged 
software manages real-to-physical address translations. 
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Privileged software maintains translation information in an arbitrary data structure, 
called the software translation table. 


The Translation Storage Buffer (TSB) is an array of Translation Table Entries which 
serves as a cache of the software translation table, used to quickly reload the TLB in 
the event of a TLB miss. 


The MMU TLBs act as independent caches of the software translation table, 
providing appropriate concurrency for virtual-to-physical address translation. 


Hyperprivileged software maintains translation information for real-to-physical 
translations. 


During a memory access, one or more TLBs are searched for a VA (or RA) 
translation. A TLB hit is indicated when the virtual address, context ID, and 
partition ID (or real address and partition ID) match an entry in the TLB. 


A TLB miss is indicated when no such match occurs, and the MMU immediately 
traps to hyperprivileged software for TLB miss processing. The TLB miss handler can 
fill the TLB by any available means, but it is likely to take advantage of the TLB miss 
support features provided by the MMU, since the TLB miss handler is time-critical 
code. 


A conceptual view of privileged-mode memory management the MMU is shown in 
FIGURE 14-1. The TLBs, which are part of the MMU hardware, are small and fast. The 
software translation table is likely to be large and complex. The translation storage 
buffer (TSB), which acts like a direct-mapped cache, is the interface between the 
software translation table and the underlying memory management hardware. The 
TSB can be shared by all processes running on a virtual processor or can be process 
specific; the hardware does not require any particular scheme. There can be several 
TSBs. 


The UItraSPARC Architecture provides a memory partitioning mechanism that 
allows for multiple partitions, each containing its own real address space. 
Hyperprivileged software provides real address to physical address translations. 
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PA < RA RA <- VA 


Translation Real Page l Software 
Lookaside Number Translation 


Translation 
Buffers _to Storage 
(TLBs) Physical Page Buffer Table 


Number 
Translation (TSB) 





Data Structure 


<— Managed by — | «——— Managed by privileged ———> 
hyperprivileged mode software 


MMU Memory Operating System 
mode software 


FIGURE 14-1 Conceptual View of the MMU 


Aliasing of multiple virtual addresses to the same physical address is supported. 
However, the reverse case of multiple mappings from one virtual address to 
multiple physical addresses producing a multiple TLB match is detected in hardware 
as a multiple tag hit TLB error. 





14.2 


14.2.1 


Hyperprivileged Memory Management 
Architecture 


The intent of the hyperprivileged memory management architecture is to provide a 
memory addressing capability for a virtualized architecture, but at the same time 
removing the explicit dependence on hardware mechanisms for virtual memory 
management. Mechanisms are provided to allow privileged mode to manipulate the 
memory made available to it, and in turn to virtualize and make that memory 
available to its nonprivileged mode process. 


Partition ID 


The hyperprivileged memory architecture has a partition ID, which separates the 
real addresses of each partition in the same way that context IDs separate virtual 
address spaces within a single real address space. Hyperprivileged mode provides 
the partition ID to create multiple real address spaces. It uses the partition ID 
register to associate addresses with their partition ID. 
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14.2.2 


The full representation of a memory address is: 
virtual address: <partition ID > :: <context ID > :: <address> 
real address: <partition ID > :: <address> 


physical address: <address> 
Nonprivileged mode only uses virtual addresses. 


Privileged mode uses virtual addresses and real addresses, and manages the 
allocation of context IDs. 


Hyperprivileged mode uses physical addresses (and explicit ASI virtual and real 
addresses) and manages the allocation of partition IDs. 


The partition ID field is included in each TLB entry to allow multiple guest 
operating systems to share the MMU. The field is loaded with the contents of the 
partition ID register when the TLB entry is loaded. In addition, the partition ID 
stored in each entry of a TLB is compared against the partition ID to determine if a 
TLB hit occurs. 


See Partition ID Register on page 527 for more details. 


Real Address Translation 


The memory system supports real addresses. In addition, real addresses are 
provided when the MMU is disabled in privileged mode. 


The MMU supports both virtual-to-physical (VA — PA) and real-to-physical 
(RA > PA) translations. 


Hyperprivileged software controls the translation mechanisms from Real Page 
Numbers (RPNs) to Physical Page Numbers (PPNs). 





14.3 


TSB Translation Table Entry (TTE) 


The Translation Storage Buffer (TSB) Translation Table Entry (TTE) is the equivalent 
of a page table entry as defined in the Sun4v Architecture Specification; it holds 
information for a single page mapping. The TTE is divided into two 64-bit words 
representing the tag and data of the translation. Just as in a hardware cache, the tag 
is used to determine whether there is a hit in the TSB; if there is a hit, the data are 
used by privileged software. 


The TTE configuration is illustrated in FIGURE 14-2 and described in TABLE 14-1. 
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FIGURE 14-2 Translation Storage Buffer (TSB) Translation Table Entry (TTE) 


TABLE 14-1 TSB TTE Bit Description (1 of 4) 








Bit Field Description 

Tag- 63:48 context id The 16-bit context ID associated with the TTE. 

Tag- 47:42 | — These bits must be zero for a tag match. 

Tag- 41:0 va Bits 63:22 of the Virtual Address (the virtual page number). Bits 21:13 of the VA 
are not maintained because these bits index the minimally sized, direct-mapped 
TSBs. 

Data — 63 v Valid. If v = 1, then the remaining fields of the TTE are meaningful, and the TTE 


can be used; otherwise, the TTE cannot be used to translate a virtual address. 


Programming | The explicit Valid bit is (intentionally) redundant with the 

Note | software convention of encoding an invalid TTE with an 
unused context ID. The encoding of the context_id field is 
necessary to cause a failure in the TTE tag comparison, 
while the explicit Valid bit in the TTE data simplifies the 
TTE miss handler. 


Data — 62 nfo No Fault Only. If nfo = 1, loads with ASI_PRIMARY_NO_FAULT{_LITTLE} or 
ASI_SECONDARY_NO_FAULT{_LITTLE} are translated. Any other data access 
with the D/UMMU TTE.nfo = 1 will trap with a data access exception (with 
SFSR.ft = 1045). An instruction fetch access to a page with the IMMU TTE.nfo = 1 
results in an instruction access exception exception. 





Data - 61:56 soft2 Software-defined field, provided for use by the operating system. The soft2 field 
can be written with any value in the TSB. Hardware is not required to maintain 
this field in any TLB (or uTLB), so when it is read from the TLB (uTLB), it may 
read as zero. 
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TABLE 14-1 


Bit 


TSB TTE Bit Description (2 of 4) 


Field 


Description 





Data - 55:13 


Data - 12 


taddr 


ie 


Target address; the underlying address (Real Address {55:13} or Physical 
Address {55:13}) to which the MMU will map the page. 

UltraSPARC Architecture TLBs store physical addresses, not real addresses. 
Hyperprivileged software is responsible for translation between real and 
physical addresses. Whether this field contains a Real or Physical address is 
determined by the bit in the corresponding MMU TSB Configuration register. 


IMPL. DEP. #441-S10: Whether an implementation uses the most significant 
physical address bit to differentiate between memory and I/O addresses is 
implementation dependent. If that method is used, then the most significant bit 
of the physical address (PA) = 1 designates I/O space and the most significant bit 
of PA = 0 designates memory space . 

IMPL. DEP. #224-U3: Physical address width support by the MMU is 
implementation dependent in the UltraSPARC Architecture; minimum PA width 
is 40 bits. 

IMPL. DEP. #238-U3: When page offset bits for larger page sizes are stored in 
the TLB, it is implementation dependent whether the data returned from those 
fields by a Data Access read is zero or the data previously written to them. 


Invert Endianness. If ie = 1 for a page, accesses to the page are processed with 
inverse endianness from that specified by the instruction (big for little, little for 
big). 


Programming | (1) The primary purpose of this bit is to aid in the mapping 
Notes | of I/O devices (through noncacheable memory addresses) 

whose registers contain and expect data in little-endian 
format. Setting TTE.ie = 1 allows those registers to be 
accessed correctly by big-endian programs using ordinary 
loads and stores, such as those typically issued by 
compilers; otherwise little-endian loads and stores would 
have be issued by hand-written assembler code. 


(2) This bit can also be used when mapping cacheable 
memory. However, cacheable accesses to pages marked 
with TTE.ie = 1 may be slower than accesses to the page 
with TTE.ie = 0. For example, an access to a cacheable 
page with TTE.ie = 1 may perform as if there was a miss in 
the first-level data cache. 


Implementation | Some implementations may require cacheable accesses to 
Note | pages tagged with TTE.ie = 1 to bypass the data cache, 
adding latency to those accesses. 
IMPL. DEP. #_: The ie bit in the IMMU is ignored during ITLB operation. It is 
implementation dependent if it is implemented and how it is read and written. 
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TABLE 14-1 


Bit 


TSB TTE Bit Description (3 of 4) 


Field 


Description 





Data - 11 


Data - 10 
Data - 9 


Data - 7 


e 


Cp, 
cv 


Side effect. If the side-effect bit is set to 1, loads with ASI, PRIMARY. NO. FAULT, 
ASI SECONDARY NO FAULT, and their * LITTLE variations will trap for 
addresses within the page, noncacheable memory accesses other than block 
loads and stores are strongly ordered against other e-bit accesses, and 
noncacheable stores are not merged. This bit should be set to 1 for pages that 
map I/O devices having side effects. Note, also, that the e bit causes the prefetch 
instruction to be treated as a nop, but does not prevent normal (hardware) 
instruction prefetching. 

Note 1: The e bit does not force a noncacheable access. It is expected, but not 
required, that the cp and cv bits will be set to 0 when the e bit is set to 1. If both 
the cp and cv bits are set to 1 along with the e bit, the result is undefined. 

Note 2: The e bit and the nfo bit are mutually exclusive; both bits should never 
be set to 1 in any TTE. 


The cacheable-in-physically-indexed-cache bit and cacheable-in-virtually- 
indexed-cache bit determine the cacheability of the page. Given an 
implementation with a physically indexed instruction cache, a virtually indexed 
data cache, and a physically indexed unified second-level cache, the following 
table illustrates how the cp and cv bits could be used: 


Cacheable Meaning of TTE when placed in: 

(cp:cv) I-TLB (Instruction Cache PA-indexed) D-TLB (Data Cache VA-indexed) 
00,01  Noncacheable —  Noncaheabe > 
10 Cacheable L2-cache, I-cache Cacheable L2-cache 

11 Cacheable L2-cache, I-cache Cacheable L2-cache, D-cache 





The MMU does not operate on the cacheable bits but merely passes them 
through to the cache subsystem. The cv bit in the IMMU is read as zero and 
ignored when written. 

IMPL. DEP. #226-U3: Whether the cv bit is supported in hardware is 
implementation dependent in the UltraSPARC Architecture. The cv bit in 
hardware should be provided if the implementation has virtually indexed 
caches, and the implementation should support hardware unaliasing for the 
caches. 


Privileged. If p = 1, only privileged software can access the page mapped by the 
TTE. If p = 1 and an access to the page is attempted by nonprivileged mode 
(PSTATE.priv = 0), then the MMU signals aninstruction access exception 
exception ordala access exception exception. 


Executable. If ep = 1, the page mapped by this TTE has execute permission 
granted. Instructions may be fetched and executed from this page. If ep =0, an 
attempt to execute an instruction from this page results in an 
instruction access exception exception. 

IMPL. DEP. # 
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TABLE 14-1 


Bit 


TSB TTE Bit Description (4 of 4) 


Field 


Description 





Data — 6 


Data — 5:4 


Data — 3:0 





14.4 


14.4.1 


w 


soft 


SZ 


Writable. If w = 1, the page mapped by this TTE has write permission granted. 
Otherwise, write permission is not granted, and the MMU causes a 
fast_data_access_protection trap if a write is attempted. 

IMPL. DEP. #___: The w bit in the IMMU is ignored during ITLB operation. It is 
implementation dependent if the bit is implemented and how it is written and 
read. 


Software-defined field, provided for use by the operating system. The soft field 
can be written with any value in the TSB. Hardware is not required to maintain 
this field in any TLB (or uTLB), so when it is read from the TLB (or uTLB), it may 
read as zero. 


The page size of this entry, encoded as shown below. 





sz Page Size 
0000 8 Kbyte 
0001 64 Kbyte 
0010 Reserved 
0011 4 Mbyte 
0100 Reserved 
0101 256 Mbyte 
0110 Reserved 
0111 Reserved 


1000-1111 Reserved 


Translation Storage Buffer (TSB) 


The Translation Storage Buffer (TSB) is an array of Translation Table Entries 
managed entirely by privileged software. It serves as a cache of the software 
translation table, used to quickly reload the TLB in the event of a TLB miss. 


Inclusion of the TLB entries in the TSB is not required; that is, translation 
information that is not present in the TSB can exist in the TLB. 


TSB Indexing Support 


Hardware TSB indexing support via TSB pointers should be provided for the TTEs. 
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14.4.2 


14.4.3 


TSB Cacheability and Consistency 


The TSB exists as a data structure in memory and therefore can be cached. Indeed, 
the speed of the TLB miss handler relies on the TSB accesses hitting the level-2 cache 
at a substantial rate. This policy may result in some conflicts with normal 
instruction and data accesses, but the dynamic sharing of the level-2 cache resource 
will provide a better overall solution than that provided by a fixed partitioning. 
Programming | When software updates the TSB, it is responsible for ensuring 
Note | that the store(s) used to perform the update are made visible in 
the memory system (for access by subsequent loads, stores, and 
load-stores) by use of an appropriate MEMBAR instruction. 


Making a TSB update visible to fetches of instructions 
subsequent to the store(s) that updated the TSB may require 
execution of instructions such as FLUSH, DONE, or RETRY, in 
addition to the MEMBAR. 





TSB Organization 


The TSB is arranged as a direct-mapped cache of TTEs. 


In each case, n least significant bits of the respective virtual page number are used as 
the offset from the TSB base address, with n equal to log base 2 of the number of 
TTEs in the TSB. 


The TSB organization is illustrated in FIGURE 14-3. The constant n is determined by 
the size field in the TSB register; it can range from 512 to an implementation- 
dependent number. 






Tag#1 (8 bytes) Data#1 (8 bytes) 





2" Lines in TSB 














Tag#2" (8 bytes) Y Data#2” (8 bytes) 





FIGURE 14-3 TSB Organization 





14.5 


Faults and Traps 


The traps recorded by the MMU are listed in TABLE 14-2. For a detailed description of 
each trap, see Chapter 12, Traps. All listed traps are precise traps. 
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TABLE 14-2 MMU Trap Types, Causes, and Stored State Register Update Policy 





Registers Updated 
(Stored State in MMU) 

















D/ 
IMMU UMMU # of Trap 
Tag Tag Trap Vectors 
Ref # Trap Name Trap Cause Access  D-SFAR Access Type Used 
1. fast instruction access MMU miss I-TLB miss X 6446 4 
2: instruction access exception Several (see below) X 0816 1 
3. fast data access MMU miss D-TLB miss X X 6846 4 
4. data access exception Several (see below) X x! 3016 1 
5. fast data access protection Protection violation X X 6C16 4 
6. privileged action Use of privileged ASI X 3716 1 
7. | PA watchpoint Watchpoint hit X 6116 1 
7b. VA watchpoint Watchpoint hit X 6216 1 
8. mem adaress not aligned, Misaligned memory impl. dep. 3516, 1 
* mem adaress not aligned operation #237-U3 3616 
3816 
3946 


* The contents of the context. id field of the DMMU Tag Access register are undefined after a data access exception. 





14.6 MMU Internal Registers and ASI 
Operations 


This section describes some of the MMU registers and how they are accessed: 
m Partition ID register 


14.6.1 Accessing MMU Registers 


All internal MMU registers can be accessed directly by the virtual processor through 
defined ASIs, using LDXA and STXA instructions. UltraSPARC Architecture- 
compatible processors do not require a MEMBAR #Sync, FLUSH, DONE, or RETRY 
instruction after a store to an MMU register for proper operation. 
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14.6.2 


TABLE 14-3 lists the MMU registers and provides references to sections with more 
details. 


TABLE 14-3 MMU Internal Registers and ASI Operations 














IMMU ASI TS VA{63:0} Access Register or Operation Name 
2116 816 RW Primary Context ID register 
— 2116 1016 RW Secondary Context ID register 
5016 5816 3046 RW I/D/U-TLBTag Access registers 
5816 8016 RW Partition ID 

















Partition ID Register 


ASI 58,6 VA 8046 


A partition ID is provided to allow multiple guest operating systems to share the 
same TLB. The partition ID register contents are compared in all TLB operations, 
such as demaps and translations, and are loaded into the PID field of the TLB tag 
during insertions. For more details on the partition ID, see Real Address Translation on 
page 520. 


IMPL. DEP. #416-S10: The size of partition ID fields in MMU partition registers is 
implementation-dependent and must be large enough to uniquely encode the 
identities of all virtual processors that share the TLB. 


The Partition ID register is defined in FIGURE 14-4, where partition id is the 8-bit 
partition ID. 


63 8 7 0 


FIGURE 14-4 Partition ID Register 
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CHAPTER 1 5 


Chip-Level Multithreading (CMT) 





An UltraSPARC Architecture 2005 processor may include multiple virtual processors 
on the same processor module to provide a dense, high-throughput system. This 
may be achieved by having a combination of multiple physical processor cores and/ 
or multiple strands (threads) per physical processor core. 


This chapter specifies a common interface between hardware and software for such 
products, referred to here as chip-level multithreaded processors (CMTs). It 
addresses issues common to CMT processors, regardless of the microarchitecture of 
the individual physical processor cores, in the following sections: 


Overview of CMT on page 529. 

Accessing CMT Registers on page 533. 

CMT Registers on page 536. 

Disabling and Parking Virtual Processors on page 540. 

Reset and Trap Handling on page 550. 

Error Handling in CMT Processors on page 553. 

Additional CMT Software Interfaces on page 558. 

Performance Issues for CMT Processors on page 559. 
Recommended Subset for Single-Strand Processors on page 559. 
Machine State Summary on page 561. 





15.1 


Overview of CMT 


A broad range of designs may fall under the definition of CMT. The interface 
specified here is intended to provide a set of common behaviors to enable operating 
system software and other privileged software to be common across UltraSPARC 
Architecture 2005 processors. This interface is not complete, as a range of 
implementation dependent features will exist to configure and control these 
processors. 


529 


The CMT programming model describes a set of privileged registers that are used 
for identification and configuration of CMT processors. Equally important, the CMT 
programming model describes certain behavior that is common across CMT 
implementations. The set of registers and the common behavior are covered in the 
following sections, grouped by topic. 


UltraSPARC Architecture 2005 processors that are not CMT processors (are single- 
threaded) should implement a subset of the CMT interface. This enables those 
virtual processors to be more easily integrated into products that may also contain 
CMT processors and also enables more consistent software to be deployed across 
future products. See Recommended Subset for Single-Strand Processors on page 559 for 
additional information on non-CMT processor implementations. 


15.1.1 CMT Definition 


An UltraSPARC Architecture 2005 CMT processor is defined by its externally-visible 
nature and not by its internal organization. The following section gives some 
background terminology, followed by a description of the CMT definition. 


15.1.1.1 Background Terminology 


The following definitions expand on the abbreviated definitions provided in 
Chapter 2, Definitions. 


Thread. Historically, the term thread is overused and ambiguous; software and 
hardware have used it differently. From a software (operating system) perspective, 
the term "thread" refers to an entity that: 


Can be executed on underlying hardware 

Is scheduled 

May or may not be actively running on hardware at any given time 
May migrate around the hardware of a system. 


From the hardware perspective, the term “multithreaded processor" refers to a 
processor that can run multiple software threads simultaneously. 


To avoid confusion, the term “thread” in UltraSPARC Architecture 2005 is used 
exclusively in the manner that it is used by software (specifically, the operating 
system). A thread can be viewed in a practical sense as a Solaris™ process or 
lightweight process (LWP). 


Strand. The term strand refers to the state that hardware must maintain in order to 
execute a software thread. Specifically, a "strand" is the software-visible architected 
state (PC, NPC, general-purpose registers, floating-point registers, condition codes, 
status registers, ASRs, etc.) of a thread plus any microarchitecture state required by 
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hardware for its execution. "Strand" replaces the ambiguous term "hardware 
thread." The number of strands in a processor defines the number of threads that an 
operating system can schedule on that processor at any given time. 


Pipeline. The term pipeline refers to an execution pipeline. It is a loose term for the 
basic collection of hardware needed to execute instructions. A pipeline may be used 
by one or more strands, in order to execute instruction from one or more 
threads.Synonym: microcore. 


Physical Core. The term physical processor core, or just physical core, is similar to the 
term "pipeline" but represents a broader collection of hardware. A physical core 
includes one or more execution pipelines and associated structures, such as caches, 
that are required for executing instructions from one or more software threads. A 
physical core contains one or more strands. The physical core provides the necessary 
resources for the threads on each strand to make forward progress at a reasonable 
rate. À multistranded physical core can execute multiple software threads by time- 
multiplexing resources, partitioning resources, or any combination thereof. 


The delineations among the terms strand, pipeline, and physical core are not precise. 
Among different microarchitecture organizations the scope of the terms may vary. In 
general, in a specific microarchitecture it will be apparent what constitutes a physical 
core. A physical core will be a highly integrated unit with a clearly defined interface 
to more distant levels of the memory hierarchy and the system interface unit. A 
physical core will contain a defined number of strands, that is, a maximum number 
of software threads that may be scheduled on it at any given time. 


Processor. A processor is the unit on which a shared interface is provided to control 
the configuration and execution of a collection of strands. A processor contains one 
or more physical cores, each of which contains one or more strands. Physically, a 
processor is a physical module that plugs into a system. A processor is expected to 
appear logically as a single agent on the system interconnect fabric. 


Therefore, a simple processor that can only execute one thread at a time (for 
example, an UltraSPARC I processor) would contain a single physical core which is 
single-stranded. A processor that follows the academic model of simultaneous 
multithreading (SMT) would contain a single physical core, where that physical core 
supports multiple strands in order to execute multiple simultaneous threads (multi- 
stranded physical core). A processor that follows the academic model of a chip 
multi-processor (CMP) would be a processor with multiple physical cores, each 
supporting only a single strand. A processor may also contain multiple physical 
cores, where each physical core is multi-stranded. 


Virtual Processor. The term virtual processor is used to identify each strand in a 
processor. Each virtual processor corresponds to a specific strand on a specific 
physical core, where multiple physical cores, each with multiple strands, may exist. 
In most respects a virtual processor appears to the system and to operating system 
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software as a processing unit equivalent to a traditional single-stranded processor 
(as in UltraSPARC I). Each virtual processor is capable of having interrupts directed 
specifically to it. At any given time, an operating system can have a different thread 
scheduled on each virtual processor. 


The UItraSPARC Architecture 2005 CMT architecture (software interface) described 
in this chapter is independent of the specific method by which multiple virtual 
processors are implemented. The term "virtual processor" is generally used instead 
of "strand" because "strand" is commonly associated with multistranded physical 
cores. 


CPU. The term CPU is ambiguous in reference to processors with multiple virtual 
processors. The term could potentially refer to a virtual processor or to an entire 
processor. Therefore, the term "CPU" is considered ambiguous and is not be used in 
this document. 


CMT. CMT is an abbreviation for "Chip MultiThreading" or, as an adjective, 
“Chip MultiThreaded". A CMT processor is a processor containing more than one 
virtual processor. 


15.1.1.2 CMT Definition 


CMT, as defined in UltraSPARC Architecture 2005, applies to all SPARC virtual 
processors. A processor containing a single virtual processor (strand) is a special 
case, covered in Recommended Subset for Single-Strand Processors on page 559. The 
CMT interface is the same whether multiple strands are provided by multiple 
physical cores, a single physical core with multiple strands, or multiple physical 
cores each with multiple strands. 


A virtual processor is a processing entity that can execute a software thread. A virtual 
processor has a number of key characteristics and includes all the architecturally 
visible state, as defined elsewhere in this specification, to execute a thread (general 
purpose registers, floating-point registers, process state, status registers, condition 
codes, etc.). A virtual processor is the smallest unit to which an interrupt can be 
delivered. The addressability of interrupts to individual virtual processors is a very 
important aspect of the CMT programming interface. An UltraSPARC Architecture 
2005 implementation must provide sufficient resources so that every virtual 
processor within the processor makes forward progress at a reasonable rate. 


Each virtual processor contains a separate instance of all user-visible architected 
state; that is, nonprivileged architected state is per-virtual processor. 


The privileged and hyperprivileged architected state of a processor falls into four 
classes (described in Classes of CMT Registers on page 534), based on the degree of 
sharing among virtual processors. 
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15.1.2 


Implementation | The UltraSPARC Architecture 2005 applies to a single physical 
Note | processor chip. In a multiple-chip system, the UltraSPARC 
Architecture 2005 applies to each processor chip. 


General CMT Behavior 


In general, each virtual processor of a CMT processor behaves functionally as if it 
was an independent processor. This is an important aspect of CMT processors 
because user code running on a virtual processor does not need to know whether or 
not that virtual processor is part of a CMT processor. At a high level, most 
privileged code in an operating system can treat virtual processors of a CMT 
processor as if each was an independent processor. Some software (for example, 
boot, error, and diagnostic) must be aware that it is executing on a CMT processor. 
This chapter deals chiefly with the interface between this software and a CMT 
processor. 


Each virtual processor of a CMT processor obeys the same memory model semantics 
as if it was an independent processor. All software designed to run in a 
multiprocessing environment, including thread libraries, must be able to operate on 
a CMT processor without modification. 


There are significant performance implications of CMT processors, especially when 
shared resources (such as caches) exist within a CMT processor. The virtual 
processors' proximity will potentially mean drastically different costs for 
communicating between two virtual processors on the same CMT processor 
compared to communicating between two virtual processors on different CMT 
processors. This adds another degree of non-uniform memory access (NUMA) to a 
system. For high performance, the operating system, and even some user 
applications, will want to program specifically for the NUMA nature of CMT 
processors. There may also be resource contention issues between virtual processors 
on the same CMT processor. Performance Issues for CMT Processors on page 559 
discusses some key performance issues related to CMT processors. 





15.2 


Accessing CMT Registers 


A key part of the CMT programming model is a set of privileged registers. This 
section covers how these registers are organized and accessed. The registers can be 
accessed by software running on a virtual processor of the CMT processor. 
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CMT-specific registers can be accessed by privileged software running on a virtual 
processor, using Load and Store Alternate (notably, LDXAs and STXAs) instructions 
that provide an address space identifier value and a (virtual) address. The CMT 
programming model defines address space identifiers and associated virtual 
addresses (VAs) for accessing the CMT-specific registers. 


15.2.1 Classes of CMT Registers 


Nonprivileged architected state, including registers visible to nonprivileged 
software, is (or at least appears to be) per-virtual-processor. 


Privileged architected state, including registers visible to privileged software, is (or 
at least appears to be) per-virtual-processor. 


The hyperprivileged architected state of a processor falls into four categories: 


m Per-virtual-processor (per-strand) registers, of which each virtual processor has a 
private (not shared) copy 


m Subset-shared registers, where a copy of each register is shared by a non- 
overlapping subset of virtual processors. 


m Per-physical-core shared registers (a special case of subset-shared registers), 
where a copy of each register is shared by all virtual processors contained within 
a physical core. 


m Processor-shared CMT registers, in which a single copy of each register is shared 
by all virtual processors in the processor 


Registers that are read-only in privileged mode (for example, TICK) need not be 
strictly implemented as per-virtual-processor registers; they may be implemented in 
one of the "shared" categories above, such that their shared nature is not visible to 
privileged software. 


CMT-specific registers of all classes can be accessed as ASI-mapped registers 
through hyperprivileged software running on a virtual processor. Software running 
on a given virtual processor can access: 


m all the per-virtual processor registers belonging to the virtual processor on which 
it is running 

m the per-physical-core shared registers belonging to the physical core on which it is 
running 


m subset-shared registers for any group of virtual processors to which the virtual 
processor on which it is running belongs 


m all processor-shared registers 


1- Currently, no architectural CMT registers fall into this category. It is defined here for completeness, because 
registers in this category may need to exist as implementation-specific registers 
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195,2:2 


In nonprivileged or privileged mode, it is normally not possible for a virtual 
processor on one physical core to address (much less, read) the per-physical-core 
registers of another physical core. On some implementations it may be possible for 
a virtual processor on one physical core to address the per-physical-core registers of 
another physical core, but only in hyperprivileged mode or if hyperprivileged 
software grants such privileges to software running at a lower privilege level. 


The semantics for accessing the CMT registers through the ASI interface are 
described in Accessing CMT Registers Through ASIs on page 535. 


Accessing CMT Registers Through ASIs 


Each CMT-specific register is accessible through a restricted ASI (accessible only in 
hyperprivileged software). The ASI number and virtual address corresponding to 
each CMT register are described later in this chapter. 


Each virtual processor can access the per-physical-core CMT registers associated 
with that virtual processor. The implementation must guarantee that accesses to per- 
physical-core registers follow sequential semantics on the virtual processor with 
which they are associated. 


Each virtual processor can access all the per-processor shared CMT registers on its 
processor. An update to a per-processor shared register from one virtual processor 
will be visible to all other virtual processors that share that register. The ordering of 
accesses to per-processor shared registers from different virtual processors is not 
defined, but an implementation must guarantee that: 


m Accesses to a shared register from the same virtual processor follow sequential 
semantics. 


m If multiple virtual processors attempt to store to a shared CMT register at the 
same time, the value observed in (readable from) the register will always be that 
written by one of those stores. That is, a store to a CMT register must be 
performed atomically on all bits of the register. In the case of the 
STRAND RUNNING register, there is a third option — a write to the register may 
be dropped (ignored) entirely in certain situations (for details, see Simultaneous 
Updates to the STRAND RUNNING Register on page 546). 


There may be additional implementation-enforced restrictions on updates to some 
CMT registers. 


All CMT registers are 64-bit registers, although some of the bits of individual 
registers can be reserved or defined to contain a fixed value in a given 
implementation. Reserved register fields should always be written by software with 
values of those fields previously read from that register or with zeroes and they 
should read as zero in hardware (see Reserved Opcodes and Instruction Fields on page 
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134). Software intended to run on future versions of CMTs should not assume that 
these fields will read as 0 or any other particular value. This convention simplifies 
future expansion of the CMT interface. 


A CMT register is accessed through load and store instructions, using a defined ASI 
number and virtual address. CMT registers can only be accessed in hyperprivileged 
mode. An attempt to access a CMT register in nonprivileged or privileged mode 
results in a privileged action exception. 


Only the LDXA or LDDFA instruction can be used to read a CMT register. Only the 
STXA or STDFA instruction can be used to store to a CMT register. An attempt to 
access a CMT register with any other instruction results in a data access exception. 
An attempt to write to a read-only CMT register with a STXA instruction results in a 
data access exception (invalid ASI) exception. 





15.3 CMT Registers 


In this section, the registers used to control operation of a processor in a CMT 
implementation are described. For each register defined in this document, a six- 
column quick-reference table is provided that specifies the key attributes of the 
register, as follows: 





Column Heading Meaning of collumn contents 
Register Name The name of the CMT register 
ASI # (Name) The address space identifier number used for accessing the register 


from software running on the CMT processor (and the recommended 
ASI name for use in assembly-language hyperprivileged software) 


VA The virtual address used for accessing the register from software 
running on the CMT processor 


Scope The scope of sharing for the register — whether the register is a "per- 
virtual processor" (per-strand) register, or a single instance of a 
register that is "shared" among the virtual processors within a 
physical core (per-core), "shared" among a subset of virtual processors 
within a physical core (per-subset), or "shared" among all the virtual 
processors within a processor (per-proc). 


Access Whether software access to the register is read/write (RW), read-only 
(R only), write-only (W only), Write-1-to-Set (W1S), or Write-1-to-Clear 
(W1C) 

Note Any additional information 
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15.3.1 


Strand ID Register (STRAND_ID) 











Register Name ASI # (Name) VA Scope Access Note 
STRAND ID 6316 1046 per- R only 
(ASI CMT. PER STRAND) strand 
63 38 37 32 31 22 21 16 15 65 0 


FIGURE 15-1 STRAND D Register 


STRAND _ID is a read-only, per-virtual processor register that holds the ID value 
assigned by hardware to each implemented virtual processor. The ID value is unique 
within the CMT processor. 


As shown above, the STRAND_ID register has three fields: 


1. strand id, which represents this virtual processor's number, as assigned by 
hardware. The strand ID is encoded in 6 bits. 


2. max strand id, which is the bit-position index (bit number) of the most 
significant '1' bit in the STRAND AVAILABLE register. This is the Strand ID of the 
highest-numbered implemented virtual processor in this CMT processor. 


3. max strand id per core, which specifies the number of strands minus one that 
are implemented on each physical core. For a single-stranded processor, 
max strand id per core will be 0. 


Many other CMT-specific registers provide a bit mask in which each bit corresponds 
to an individual virtual processor. For these registers, the strand id field indicates 
which bit of a bit mask corresponds to this specific virtual processor. 


Strand Numbering Convention. The numbering of virtual processors (strands) 
may or may not be contiguous; system software may only assume that each strand 
ID is unique within a CMT processor. In general, virtual processors should be 
numbered in a sequential, contiguous series starting with strand number 0. When 
numbering the virtual processors within a CMT processor, this convention appears 
straightforward. There are cases, however, where this might not be so simple. This 
numbering convention is recommended but not required. 


In a CMT processor designed with many virtual processors, some physical cores in a 
manufactured CMT processor may fail to function correctly. It is likely that there 
would be a desire to salvage a partially good CMT processor (one where a subset of 
the virtual processors and all the common area function correctly) and use it as a 
CMT processor with fewer than the maximum number of functional virtual 
processors. In such a case, it would be possible that the functional strands be 
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numbered contiguously, starting from 0, and that the STRAND ID.max strand id 
field be set to the highest-numbered functional virtual processor. This requires some 
way to reassign the identity of individual virtual processors after manufacturing. If 
this is not practical, the functioning virtual processors may not be contiguously 
numbered. 


15.3.1.1 Exposing Stranding 


If a processor implements multiple strands per physical core, the stranding is 
exposed in STRAND_ID.max strand id per core. This field encodes one less than 
the number of strands that are implemented on the physical processor core; for 
example, on a physical core with 4 strands, 

STRAND_1ID.max_strand_id_per_core = 3. Every virtual processor within the 
physical core must observe the same value of max_strand_id_per_core. An 
implementation defines and count strands and physical processor cores as 
appropriate for that implementation. 








When STRAND ID.max strand id per core is nonzero, there are additional 
constraints on the numbering of virtual processors. virtual processors that 
correspond to strands on the same physical processor core must have contiguous 
STRAND_|D.strand_id values, with the lowest numbered virtual processor on a 
physical core having a strand_id value that is a multiple of the number of strands on 
each physical core. 





It is important to expose stranding to software. From a performance standpoint, 

stranding must be exposed for the operating system to understand resource sharing 
and contention issues and to optimally schedule software threads on the processor. 
From a power management perspective, knowledge of stranding enables the facility 
to park or disable all strands on a physical core to obtain significant power savings. 


15.3.2 Strand Interrupt ID Register (STRAND_INTR_ID) 


Register Name ASI # (Name) VA Scope Access Note 
STRAND_INTR_ID 63:4 0016 per- RW 
(ASI CMT. PER STRAND) strand 


STRAND INTR ID Reserved int. id 


63 16 15 0 
FIGURE 15-2 STRAND INTR D Register 





The STRAND INTR ID register allows software to assign a 16-bit interrupt ID, 
unique within a system, to each virtual processor. This is necessary in order to 
enable virtual processors to receive interrupts. The identifier in this register is used 
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by other virtual processors (on the same and different CMT processors) and other 
bus agents to address interrupts to this specific virtual processor. It can also be used 
by this virtual processor to identify itself as the source of an interrupt it sends to 
other virtual processors and bus agents. 


This register is Read/Write, accessible only in hyperprivileged mode 
(HPSTATE.hpriv = 1). It is expected that it will be modified only at boot or 
reconfiguration time. An attempt to access this register in privileged mode or 
nonprivileged mode results in a privileged action exception. 


The STRAND INTR D register has only one field, a 16-bit interrupt ID field, named 
int id. 


If an implementation uses fewer than 16 bits for its interrupt ID, the unused bits 
read as zero and writes to them are ignored. 


IMPL. DEP. #: It is implementation dependent whether any portion of the int id field 
of the STRAND INTR D register is read-only (see following subsection, Assigning an 
Interrupt ID). 


15.3.2.1 Assigning an Interrupt ID 


When assigning the interrupt ID to a virtual processor, software must be aware of 
interrupt routing conventions used in the system. Some portion of the interrupt ID 
might be required to follow a hardware convention to enable the interrupt to be 
correctly routed through the system interconnect. In some implementations, a part of 
the interrupt ID can be fixed by the processor to correspond to the strand ID. This 
portion of the interrupt ID can be read-only in the STRAND INTR ID register. Such 
requirements are both processor- and system-platform-specific. 


Each virtual processor in the CMT processor must have an interrupt ID that is 
unique within the system. If the interrupt ID of multiple virtual processors in the 
same system are set to the same value, the behavior of the processor is undefined 
when an interrupt specifying that ID is sent or received. 


15.3.2. Dispatching and Receiving Interrupts 


The mechanisms used to dispatch and receive interrupts must work with the 
interrupt ID register. A processor’s interrupt dispatch mechanism must be able to 
specify the interrupt ID of the destination virtual processor to which the interrupt is 
to be delivered. When a destination interrupt ID is specified, the interrupt must be 
delivered to the virtual processor that has the matching ID in its STRAND_INTR_ID 
register. 
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15.3.2.3 Updating the Strand Interrupt ID Register 


It is expected that the interrupt ID register of a virtual processor will be written once 
by software, when a virtual processor is initially booted. It is assumed that while a 
virtual processor is being booted, there will be no interrupt traffic in the system. 


The latency from when software writes to STRAND_INTR_ID to when the write takes 
effect is implementation dependent. Use of a MEMBAR #Sync instruction after a 
write to STRAND_INTR_ID will cause the write to become visible before any 
instructions after the MEMBAR are executed on the virtual processor. 


Updates to STRAND_INTR_ID are atomic: if STRAND_INTR_ID is written, the value 
observed at any time will be either the old value or the new value; no transient value 
will be observed. If an interrupt is issued to a virtual processor while its interrupt ID 
register is being updated (addressed either to its old or new interrupt ID), the 
interrupt may or may not be received by the virtual processor. Once a virtual 
processor acknowledges an interrupt using its new interrupt ID, it will not 
acknowledge any interrupts addressed to the old interrupt ID. 


If an interrupt is issued to a system, addressed to an interrupt ID that does not 
match any virtual processors or other system agents, the interrupt will not be 
acknowledged and will be dropped. 





15.4 Disabling and Parking Virtual 
Processors 


The CMT programming model provides the ability to disable virtual processors and 
temporarily suspend (park) virtual processors. This section describes the interface 
for probing what virtual processors are available, enabled, and running (not parked). 
This section also describes the interface for enabling/disabling virtual processors 
and parking/unparking virtual processors. 
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15.4.1 


Strand Available Register (STRAND AVAILABLE) 


Register Name ASI # (Name) VA Scope Access Note 





STRAND AVAILABLE 4156 (ASI CMT SHARED) 0046 per-proc R only 





15.4.2 





STRAND AVAILABLE|  SrandAvalablebis | 


63 0 
FIGURE 15-3 STRAND AVAILABLE Register 


The STRAND AVAILABLE register is a shared (one per processor) register that 
indicates which virtual processors are available for use (that is, are present and 
functional) in a CMT implementation. 


The STRAND AVAILABLE register is read-only, comprising a single 64-bit field. As 
illustrated in FIGURE 15-3, bit n corresponds to virtual processor n; therefore up to 64 
virtual processors are supported per CMT. If a bit in the register is 1, the 
corresponding virtual processor is available for use in the CMT. If a bit in the 
register is 0, the corresponding virtual processor is not available for use. An 
"available" virtual processor is one that is present and functional, therefore can be 
enabled and used. 


Enabling and Disabling Virtual Processors 


The CMT programming model allows virtual processors to be enabled and disabled. 
Enabling or disabling a virtual processors is a heavyweight operation that in most 
cases requires either a power on reset (POR) or a warm reset (WRM) for updates. 
A disabled virtual processor produces no architectural effects observable by other 
virtual processors, and does not participate in cache coherency. The behavior of any 
transaction (such as an interrupt) issued to a disabled virtual processor is undefined. 


IMPL. DEP. #322-U4: Whether disabling a virtual processor reduces the power used 
by a CMT is implementation dependent. It is recommended that a disabled virtual 
processor consume a minimal amount of power. 


IMPL. DEP. #423-S10: Whether disabling a virtual processor increases the 
performance of other virtual processors in the CMT is implementation dependent. 
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15.4.2.1 Strand Enable Status Register 
(STRAND ENABLE STATUS) 


Register Name ASI # (Name) VA Scope Access Note 





STRAND ENABLE STATUS 4156 (ASI CMT SHARED) 10,6  per-proc R only 





STRAND ENABLE STATUS Strand Enable Status bits 


63 0 
FIGURE 15-4 STRAND ENABLE STATUS Register 





The STRAND ENABLE STATUS register is a shared (one per processor) register 
that indicates which virtual processors are currently enabled. The register is a read- 
only register, in which each bit corresponds to a virtual processor. 


As shown in FIGURE 15-4, bit n corresponds to virtual processor n. If a bit in the 
STRAND ENABLE STATUS register is 1, the corresponding virtual processor is 
available and enabled. A virtual processor indicated as "not available" in the 
STRAND AVAILABLE register cannot be enabled, and its corresponding enabled bit 
in this register will be 0. An available, enabled virtual processor that is parked is still 
considered enabled. 


Programming | Hyperprivileged software should never set bit 
Note | STRAND_ENABLE{n} to 1 if STRAND_AVAILABLE{n} = 0. 


State After Reset. The STRAND ENABLE STATUS register changes due to a 
power on reset. (POR) or a warm reset (WRM). During a power on reset, the 
contents of its STRAND AVAILABLE register are copied to the 

STRAND ENABLE STATUS register. During a warm reset reset, the contents of the 
STRAND ENABLE register are copied to the STRAND ENABLE STATUS register. 


15.4.2.2 Strand Enable Register (STRAND ENABLE) 


Register Name ASI # (Name) VA Scope Access Note 


STRAND ENABLE 4144 2036 per-proc RW Changes take effect during reset 
(ASI. CMT. SHARED) 





STRAND ENABLE Strand Enable bits 
63 0 


FIGURE 15-5 STRAND ENABLE Register 


542 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


The STRAND ENABLE register is a shared (one per processor) register, used by 
software to enable and disable a CMT's virtual processors. When disabled, a virtual 
processor and any structures private to that virtual processor behave as though they 
were not present. 


Programming | When re-enabled, per-strand architectural state that existed 
Note | when the virtual processor was previously enabled should be 
assumed to be lost. Therefore, hyperprivileged software must 
initialize any needed per-strand architectural state each time a 
virtual processor is enabled. 


Changing a bit in the STRAND ENABLE register does not take effect (cause a virtual 
processor to be enabled/disabled) immediately. Instead, it indicates a pending 
change to the STRAND ENABLE STATUS register, which will not take effect until 
the next warm reset (WRM) reset — at which time, the contents of the 

STRAND ENABLE register are copied to the STRAND ENABLE STATUS register. 
A change in the STRAND ENABLE register may also take place at some other 
implementation-dependent time (see Dynamically Enabling/Disabling Virtual 
Processors on page 544 (impl. dep. &  ). 


As shown in FIGURE 15-5, the STRAND ENABLE register contains one bit per 
possible virtual processor, with bit n corresponding to virtual processor n. If bit n is 
1, then virtual processor n should be enabled after the next warm reset (if that 
virtual processor is available). If bit n is 0, then virtual processor n should be 
disabled after the next warm reset. 


When bit n in the STRAND AVAILABLE register is 0 (the virtual processor is 
unavailable), the corresponding bit (bit n) in the STRAND ENABLE register is 
forced to 0 and attempts to write "1" to bit n in the STRAND ENABLE register are 
ignored. 


Restrictions on Updating the STRAND ENABLE Register. 


IMPL. DEP. #323-U4: Whether an implementation provides a restriction that 
prevents software from writing a value of all zeroes (or zeroes corresponding to all 
available virtual processors) to the STRAND ENABLE register is implementation 
dependent. This restriction avoids the dangerous case where all virtual processors 
become disabled and the only way to enable any virtual processor is a hard 
power on reset (a warm reset would not suffice). If such a restriction is 
implemented and software running on any virtual processor attempts to write a 
value of all zeroes (or zeroes corresponding to all available virtual processors) to the 
STRAND ENABLE register, hardware forces the STRAND ENABLE register to an 
implementation-dependent value which enables at least one of the available virtual 
processors. 
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State After Reset. Upon assertion of power on reset, the value of the 
STRAND AVAILABLE register is copied to the STRAND ENABLE register. The 
STRAND ENABLE register does not change during any other reset, including 
system (or equivalent) resets. 


15.4.2.3 Dynamically Enabling/Disabling Virtual Processors 


IMPL. DEP. #424-S10: Whether a CMT implementation provides the ability to 
dynamically enable and disable virtual processors is implementation dependent. It is 
tightly coupled to the underlying microarchitecture of a specific CMT 
implementation. This feature is implementation dependent because any 
implementation-independent interface would be too inefficient on some 
implementations. 


15.4.8 Parking and Unparking Virtual Processors 


Parking is a way to temporarily suspend the operation of a virtual processor, 
intended for use by critical diagnostic and recovery code. A parked virtual processor 
can be later unparked to allow it to resume running. A virtual processor can be 
parked or unparked at arbitrary times using the STRAND RUNNING register and a 
WMR or POR reset is not required for parking/unparking to become effective. The 
STRAND RUNNING STATUS register can be used to determine whether a virtual 
processor that has been directed to park has completed the process of parking. 


A parked virtual processor does not execute instructions and does not initiate any 
transactions on its own. If any portion of the memory system resides in a parked 
virtual processor, it will continue to be updated as necessary for it to remain 
coherent with the rest of the memory system while the virtual processor is parked. 


When a virtual processor is unparked, it continues execution with the instruction 
that was next to be executed when the virtual processor was parked. It is transparent 
to software running on a virtual processor that it was ever parked (except for 
observable timing considerations). 


While a virtual processor is parked, the STICK register continues to count. 


IMPL. DEP. #425-S10: It is implementation dependent whether the TICK register 
continues to count while a virtual processor is parked. 


Using the TICK or STICK counter to detect the parking of a virtual processor is not 
recommended. 


An interrupt to a parked virtual processor behaves the same as if the virtual 
processor was too busy to accept the interrupt. 
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IMPL. DEP. #324-U4: It is implementation dependent whether parking a virtual 
processor reduces the power used by a CMT. It is recommended that a parked 
virtual processor use a reduced amount of power. 


Parking a virtual processor should, when appropriate, reduce the contention for 
shared resources and enable other virtual processors to potentially run faster. 


IMPL. DEP. #426-S10: The degree to which parking a virtual processor impacts the 
performance of other virtual processors is implementation dependent. 


Implementation | One possible way to implement virtual processor parking is to 
Note | disable instruction fetching in a parked virtual processor. In 
such an implementation, after a virtual processor is parked, it 
will execute the instructions currently in its pipeline, complete 
pending transactions (such as draining the store queue), and 
then become idle (at which time, its status in the 
STRAND RUNNING. STATUS register will change from 
“Running” to "Parked"). 





15.4.3.1 Strand Running Register (STRAND. RUNNING) 


Register Name ASI # (Name) VA Scope Access Note 
STRAND_RUNNING_RW 4i, (ASI CMT SHARED) SU perproc RW General RW access — 
STRAND RUNNING W1S  414g(ASI CMT SHARED) 60%  per-proc WIS Write 1s to set bits 
STRAND RUNNING W1C  41,5(ASI CMT SHARED) 68,6  per-proc W1C Write 1s to clear bits 


STRAND RUNNING Strand Running bits 
63 0 


FIGURE 15-6 STRAND RUNNING Register 





STRAND RUNNING is a shared (one per processor) register, used by software to 
park and unpark selected virtual processors in a CMT implementation. When a 
virtual processor is parked, the virtual processor stops executing new instructions 
and will not initiate new transactions except in response to a coherency transaction 
initiated by another virtual processor. 


IMPL. DEP. #427-S10: There may be an arbitrarily long, but bounded, delay (“skid”) 
from the time when a virtual processor is directed to park or unpark (via an update 
to the STRAND RUNNING register) until the corresponding virtual processor(s) 
actually park or unpark. 


Multiple access methods are provided for writing bits in the STRAND RUNNING 
register, distinguished by the virtual address used (listed above): 


a STRAND RUNNING RW, for normal reading and writing of the entire register 
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m STRAND_RUNNING_W1S (“Write 1 to Set"), where writing ‘1’ to a bit sets the 
destination bit to ‘1’ and writing ‘0’ to a bit leaves the destination bit unchanged 


m STRAND RUNNING Wf10C (“Write 1 to Clear"), where writing ‘1’ to a bit sets the 
destination bit to '0' (clears it) and writing '0' to a bit leaves the destination bit 
unchanged 


A specific value can be atomically written to all bits of the STRAND RUNNING 
register, using STRAND RUNNING. RW, or bits can be individually modified, using 
STRAND RUNNING W1$ or STRAND RUNNING Whl1C. When a virtual processor 
parks itself, software should write to STRAND RUNNING W1C. When a virtual 
processor wants to become the only active virtual processor (parking all other 
virtual processors in the CMT), it is more appropriate to write the desired value 
directly to STRAND RUNNING RW. A direct write eliminates the need to perform 
separate set and clear operations to write a specific value to the register. 


As shown in FIGURE 15-6, the STRAND RUNNING register contains one bit per 
possible virtual processor, with bit n corresponding to virtual processor n. Writing a 
value of 1 to bit position n activates (unparks) virtual processor n for normal 
execution, while writing a value of 0 to bit n parks virtual processor n. If bit n in the 
STRAND ENABLE STATUS register is 0 (not enabled), hardware forces the 
corresponding bit in the STRAND RUNNING register to 0 and attempts to write to 
that bit are ignored. 


Updating the STRAND RUNNING Register. When a virtual processor parks 
itself by updating the STRAND RUNNING register and follows the update with a 
FLUSH instruction, no instruction after the FLUSH instruction will be executed until 
the virtual processor is unparked. The virtual address specified in the FLUSH 
instruction is not important. The FLUSH instruction may be executed either before 
parking takes effect or after the virtual processor is unparked. The FLUSH can, 
therefore, enable software to bound when parking takes effect, in the case when a 
virtual processor parks itself. 


IMPL. DEP. #428-S10: When a virtual processor writes to the STRAND RUNNING 
register to park itself, the method by which completion of parking is assured 
(instructions stop being issued) is implementation dependent. 


Simultaneous Updates to the STRAND RUNNING Register. Hardware is not 
required to provide a mechanism for handling simultaneous updates from different 
strands to the STRAND RUNNING register. 
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Programming | It is the responsibility of hyperprivileged software to insure that 

Note | a livelock condition, resulting from simultaneous updates from 
different strands to the STRAND RUNNING register, does not 
occur. 


After writing to STRAND_RUNNING with a STXA instruction, 
hyperprivileged software should check the 
STRAND_RUNNING_STATUS register to verify when the 
attempted parking/unparking of virtual processor(s) actually 
completed. 





At Least One Virtual Processor Must Remain Unparked. Hardware enforces 
the restriction that an update to the STRAND RUNNING register by software 
running on one of the virtual processors cannot cause all of the enabled virtual 
processors to become parked. This restriction is important to avoid the dangerous 
situation where all virtual processors become parked and there is no way to 
reactivate any of the virtual processors (without a warm reset or power-on reset). 


IMPL. DEP. #429-S10: If an update to the STRAND RUNNING register would cause 
all enabled virtual processors to become parked, it is implementation dependent 
which virtual processor is automatically unparked by hardware. The preferred 
implementation is that when an update to the STRAND RUNNING register (STXA 
instruction) would cause all virtual processors to become parked, hardware silently 
ignores (discards) that STXA instruction. 


Implementation | It is important that when a virtual processor attempts to issue an 

Note | update to the STRAND RUNNING register that would cause all 
virtual processors to become parked, that virtual processor is rot 
parked. A virtual processor updating the STRAND RUNNING 
register will be executing a section of software (error diagnostic 
or other special code) that is aware of the behavior and 
implications of parking. When an attempt is made to park all 
virtual processors, automatically unparking an arbitrary virtual 
processor would be problematic, because a virtual processor in 
the midst of running nonprivileged code could become the only 
unparked virtual processor. If this were to happen, the only 
active virtual processor in the CMT would be unaware of the 
state of the CMT and would not know to check the running 
status of other virtual processors. 





At Least One Virtual Processor Must Remain Unparked — Multiprocessor 
Configuration. When there are multiple processors (chips) in the configuration, 
there is still a requirement to have at least one virtual processor unparked on each 
processor. However, from a testing point of view, it is desirable to be able to unpark 
all but one virtual processor in the entire multiprocessor configuration. 
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IMPL. DEP. #430-S10: In a multiprocessor configuration, whether all but one virtual 
processor can be parked is implementation dependent. 


State After Reset. Upon power-on reset or warm reset, the STRAND RUNNING 
register by default is initialized such that all the virtual processors are parked except 
for the lowest-numbered enabled virtual processor. This provides a default on-chip 
“boot master" virtual processor, reducing BootBus contention. 


Note | For systems that use a system reset pin, the value of the 
STRAND RUNNING register is updated upon assertion of the 
warm reset signal. 


15.4.3.2 Strand Running Status Register 
(STRAND RUNNING STATUS) 





Register Name ASI # (Name) VA Scope Access Note 





STRAND RUNNING STATUS 4i, 586  perproc R only 
(ASI, CMT. SHARED) 





STRAND RUNNING STATUS 


63 0 
FIGURE 15-7 STRAND RUNNING. STATUS Register 


STRAND RUNNING STATUS is a shared (one per processor) register. It indicates 
whether a virtual processor is still active (running) or has actually become parked. It 
is needed because there may be a delay between the time when a virtual processor is 
directed to park (via the STRAND RUNNING register) and the time when it actually 
becomes parked. The STRAND RUNNING STATUS register is a shared, read-only 
register in which bit n indicates if strand n is active. 


There is an implementation-dependent delay from the time virtual processor n is 
directed to park by writing 0 to bit n of the STRAND RUNNING register until it 
actually becomes parked (impl. dep. 4427-510). 


As shown in FIGURE 15-7, the STRAND RUNNING. STATUS register has one 64-bit 
field (one bit per possible virtual processor), with bit n corresponding to virtual 
processor n. 


m If virtual processor n is enabled (STRAND_ENABLE_STATUS{n} = 1): 


a a value of 0 in bit n of the STRAND RUNNING STATUS register indicates that 
virtual processor n is truly parked and will not execute any additional 
instructions or initiate new transactions until it is unparked. 
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15.4.4 


a A value of 1 in bit n of the STRAND RUNNING STATUS register indicates that 
a virtual processor is active and can execute instructions and initiate 
transactions. All virtual processors that have a 1 in the STRAND RUNNING 
register must have a 1 in the STRAND RUNNING. STATUS register. 


m If virtual processor n is disabled (STRAND ENABLE STATUS (nj = 0), bit n of the 
STRAND RUNNING STATUS register must be 0. 


The STRAND RUNNING. STATUS register indicates when a virtual processor that has 
been directed to park has actually parked, that is, is no longer executing instructions 
or initiating any transactions (except in response to coherency transactions generated 
by other virtual processors). 


IMPL. DEP. #431-S10: The criteria used for determining whether a virtual processor 
is fully parked (corresponding bit set to '1' in the STRAND RUNNING STATUS 
register) are implementation dependent. 


After bit n in the STRAND RUNNING register has been changed from 1 to 0, 
hardware must guarantee that only a single transition from 1 to 0 in bit n of the 
STRAND RUNNING. STATUS register will be observed. 


State After Reset. The value of the STRAND RUNNING. STATUS register is the 
same as the value of the STRAND RUNNING register at the end of a system reset. 


Virtual Processor Standby (or Wait) State 


IMPL. DEP. #432-S10: Whether an implementation implements a Standby (or 
Wait) state for virtual processors, how that state is controlled, and how that state is 
observed are implementation-dependent. 


In a Standby state, the virtual processor is suspended for a predetermined period of 
time and/or until an external interrupt is received. A Standby state may appear 
similar to a Parked state, but virtual processor St andby state (if implemented) 
must be completely orthogonal to parking. The details of the software interface to 
and implementation of Standby /Wait state is beyond the scope of this 
specification. 


With respect to parking, the virtual processor is either Running or not running 
(Parked), as indicated in the STRAND RUNNING STATUS register. With respect to 
standby, the virtual processor is either in Standby or Normal state. Since these 
features are independent, the virtual processor can be in any of the four possible 
combinations of these states. A virtual processor is still considered running if it is in 
a Standby mode but is not Parked. If a virtual processor is in a Standby mode 
and becomes Parked, it will remain Parked even if an event causes it to change 
from Standby to Normal mode; it will not execute instructions until it is later 
unparked. 
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Implementing a Standby mode may provide performance and/or power- 
consumption benefits. A virtual processor in Standby mode may cause less 
resource contention with other running virtual processors and may consume less 
power. 





15.5 Reset and Trap Handling 


In a CMT, some resets apply globally to all virtual processors, some apply to an 
individual virtual processor, and some apply to an arbitrary subset of virtual 
processors. The following sections address how each type of reset affects the virtual 
processors in a CMT. 


The reset nomenclature used in this section is generally consistent with that used for 
UltraSPARC Architecture 2005 processors. If future processors classify resets 
differently, this model should be extended appropriately to the new classifications. 


Traps (as opposed to resets) apply to individual virtual processors and are discussed 
in Traps on page 453. 


15.5.1 Per-Strand Resets (SIR and WDR Resets) 


The only resets that affect only a single virtual processor are those that are internally 
generated by a virtual processor, such as software initiated reset (SIR) and watchdog 
reset (WDR). These resets are generated by an individual virtual processor and are 
not propagated to the other virtual processors in a CMT. 


155.2 Full-Processor Resets (POR and WRM Resets) 


There is a class of resets that are generated by an external agent and apply to all the 
virtual processors within a processor. This class includes all resets associated with 
fundamental CMT reconfigurations. 


power_on_reset (POR) is one case of full-processor reset. Warm reset is another 
example of such a reset (warm reset may be either processor or physical strand- 
specific, depending on the implementation). Full-processor reset is required for 
certain reconfigurations of the processor. 


Power-on reset and warm reset (or their equivalents in future processors) are global 
resets, sent to all strands in a CMT processor. 
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15.5.3 


15.5.2.1 Boot Sequence 


As discussed in Strand Running Register (STRAND RUNNING) on page 545, the 
default boot sequence is for all virtual processors except one (nominally, the lowest- 
numbered enabled virtual processor) to be set to Parked state at the beginning of 
full-processor reset. The single unparked virtual processor is the master virtual 
processor, which should arbitrate for the BootBus (if multiple CMT processors share 
the same BootBus). The master virtual processor (or service processor) should 
unpark the other virtual processors in the processor at the appropriate time in the 
booting process. 


Partial Processor Resets (XIR Reset) 


There is a class of resets, referred to here as "partial-processor resets," that are 
generated by an external agent and affect an arbitrary subset of virtual processors 
within a processor. The subset may be anything from all virtual processors to no 
virtual processors (impl. dep. 5433-510). 


Externally-initiated reset (XIR) is a partial-processor reset. XIR is intended to reset a 
specific virtual processor in a system, primarily for diagnostic and recovery 
purposes. 


IMPL. DEP. #433-S10: A mechanism must exist to specify which subset of virtual 
processors in a processor should be reset when a partial-processor reset (for 
example, XIR) occurs. The specific mechanism is implementation-dependent. 


Possible methods of specifying the subset include the following: 


1. Before the partial-processor reset occurs, set up a steering register that specifies 
the subset of virtual processors that should be affected. For systems using an XIR 
reset, the XIR Steering register described in XIR Steering Register 
(XIR. STEERING) on page 552 should be used. 


2. Specify the subset of virtual processors concurrently with the reset request, across 
the same interface used for communicating the reset. This method would require 
that the interface used for communicating resets supports sending packets of 
information along with the resets. 


In an implementation that replaces the XIR reset with a different set of resets, the 
following rules apply for extending this CMT programming interface: 


m Each partial-processor reset may use an interface where the set of virtual 
processors to reset is communicated along with the reset request. 


m For partial-processor resets for which the set of virtual processors to be reset is not 
communicated along with the reset request: 


a The highest priority virtual processor will use the XIR. STEERING register to 
determine the subset of virtual processors to be reset. 
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» Each subsequent lower-priority virtual processor can either use the 
XIR. STEERING register or use an additional steering register (comparable to 
XIR. STEERING), specifically associated with that reset. Each additional 
steering register will be accessed using the same ASI number (4145) as the 
XIR. STEERING register but with a distinct virtual address. 


15.5.3.1 XIR Steering Register (XIR. STEERING) 














Register Name ASI # (Name) VA Scope Access Note 
XIR. STEERING 4156 (ASI, CMT. SHARED) 3046 per-proc RW General access 
XIR STEERING XIR Steering bits 
63 0 


FIGURE 15-8 XIR. STEERING Register 


An externally initiated reset (XIR) can be steered to an arbitrary subset of virtual 
processors, using the XIR. STEERING register. The XIR. STEERING register is 
shared across virtual processors and is used by software to control which virtual 
processor(s) within a processor will receive the XIR reset signal when XIR is asserted 
for the processor module. 


As shown in FIGURE 15-8, the XIR STEERING register has one 64-bit field (one bit 
per possible virtual processor), in which bit corresponds to virtual processor n. 


When an external reset is asserted for the CMT, if bit n in the XIR STEERING 
register is 1, virtual processor r receives an XIR reset; if bit n in the XIR STEERING 
register is 0, virtual processor n continues execution, unaware of the external reset 
asserted for the CMT. 


A virtual processor that is parked when it receives an XIR reset remains parked and 
will handle the XIR reset immediately after being unparked. 


IMPL. DEP. #325-U4a: Whether XIR_STEERING{n} is a read-only bit or a read/ 
write bit is implementation dependent. If XIR_STEERING{n} is read-only, then (1) 
writes to XIR_STEERING{n} are ignored and (2) XIR_STEERING{n} is set to 1 if 
virtual processor n is available and to 0 if it is not available (that is, 

XIR. STEERING (7) reads the same as STRAND_AVAILABLE{n}. 


It may be desirable for an XIR to effectively unpark and reset all virtual processors 
in a CMT. If so, that effect can be generated by having the first action of software on 
virtual processor receiving an XIR to unpark all other virtual processors in the CMT. 
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State After Reset. 


During power on reset, the contents of the STRAND AVAILABLE register are 
copied to the XIR STEERING register. During a warm reset, the contents of the 
STRAND ENABLE register are copied to the XNIR. STEERING register. This provides 
for a default condition in which all enabled virtual processors receive an XIR reset 
when an external reset is asserted for the processor. (impl. dep. #325-U4b) 





15.6 


15.6.1 


Error Handling in CMT Processors 


Errors in a structure private to a virtual processor are considered virtual- 
processor(strand)-specific and are reported to that virtual processor using its error- 
reporting mechanism. 


When an error in a structure shared among virtual processors occurs: 


m If the virtual processor initiating the request that caused or detected the error can 
be identified, the error is considered virtual-processor-specific and is reported 
back to the originating virtual processor. 


m If the virtual processor initiating the request that caused or detected the error 
cannot be identified, the error is considered non-virtual-processor-specific. 


m All virtual processors that share a structure are considered to be part of the error- 
handling group for that structure. This implies that any virtual processor in the 
group can be assigned to handle error traps associated with the structure and 
have diagnostic access to the structure for error recovery. 


The following sections describe how a CMT processor handles both virtual- 
processor-specific and non-virtual-processor-specific errors. 


Virtual-Processor-Specific Error Reporting 


Errors specific to a particular virtual processor are reported to the virtual processor 
associated with the error, using the virtual processor's error reporting mechanism. A 
virtual-processor-specific error can be either synchronous or asynchronous. It may 
be an error that occurred in a shared structure but is traceable to the originating 
virtual processor. It is the responsibility of error handling software to recognize the 
implication of errors in shared structures and take appropriate action. 
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15.6.2 Reporting Errors on Shared Structures 


Errors in shared structures are more complicated than virtual-processor-specific 
errors. When a non-virtual-processor-specific error occurs, it must be recorded and 
an exception must be generated on one of the virtual processors within the CMP to 
deal with the error. More precisely, the virtual processor that reports the exception 
must be part of the error-handling group for the shared. structure in which the error 
was detected. The following subsections describe where the error should be 
recorded and in which virtual processor the exception should be generated. 


15.6.2.1 Error Steering 


When an error occurs in a shared resource, the error must be reported to a virtual 
processor that shares that resource and is part of its error-handling group. That 
virtual processor has the capability of issuing diagnostic reads and writes to the 
structure for diagnosis, correction, and error-clearing purposes. Error steering 
registers are used to determine which virtual processor will handle the error. 
Software configures an error steering register to specify which virtual processor 
should handle the error(s) associated with that error steering register. That is, an 
error steering register defines in which virtual processor an exception will be 
generated, to report and handle the error. 


A given CMT implementation may contain resources shared by all the virtual 
processors of the CMT processor or shared by a subset of two or more virtual 
processors. 


IMPL. DEP. #434-S10: Because of the range of implementation, the number of, 
organization of, and ASI assignments for error steering registers in a CMT processor 
are implementation dependent. 


Error steering registers may be provided per shared resource or per level of sharing. 
In the case that all shared resources are shared by all virtual processors, it is 
recommended that a single error steering register be used and that error steering 
register should follow the behavior of the ERROR STEERING register defined in 
Error Steering Register (ERROR STEERING) on page 556. If a mechanism is used 
where error steering registers are used per level of sharing, it is recommended that 
the ERROR STEERING register be used for the level at which all virtual processors 
share and provide error-handling groups. 


General Guidelines for Error Steering Registers. An error steering register 
controls which virtual processor handles non-virtual-processor-specific errors. Such 
an error is recorded using the virtual processor's asynchronous error reporting 
mechanism (as relevant to the error) and generates an appropriate exception. 


An error steering register is accessed through an ASI or a memory-mapped address. 
It must be accessible for both reading and writing by software (using load and store 
alternate instructions). 
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A processor contains one or more error steering registers. The number of error 
steering registers needed depends on how resources are shared and the ability of a 
virtual processor to diagnose errors in a resource it does not share. 


An error steering register specifies a virtual processor by an encoded field, target id, 
that corresponds to the strand id of the targeted virtual processor. Use of an 
encoded representation guarantees that only one virtual processor can be specified. 
An error steering register should contain only one field, the target id field, that 
encodes the strand id of the virtual processor that should be informed of non- 
virtual-processor-specific errors in its sharing group. 


IMPL. DEP. #326-U4-Cs10a: The number of implemented bits of 
ERROR STEERING.target id is nominally six, but is implementation dependent 
and must be sufficient to encode the highest implemented virtual processor ID. 


It is the responsibility of software to ensure that an error steering register identifies 
an appropriate virtual processor for handling the error(s) assigned to it. If an error 
steering register identifies a virtual processor that is not available (per 

STRAND AVAILABLE) or is disabled (per STRAND ENABLE STATUS), none of the 
enabled virtual processors in the error-handling group will be affected by the 
reporting of a non-virtual-processor-specific error to the disabled virtual processor. 
However, the behavior of the specified disabled virtual processor is undefined; for 
example, the error status register in the disabled virtual processor may or may not 
be observed to have been updated. 


If an error steering register identifies a virtual processor that is not part of the error- 
handling group, operation is also undefined. An example would be if the error 
steering register identifies a virtual processor in another error-handling group for a 
virtual-processor-specific error. To avoid this case, an error steering register should 
be assigned on a core basis for core errors that are non-virtual-processor-specific. 


If an error steering register identifies a virtual processor that is parked, the non- 
virtual-processor-specific error is reported to that virtual processor and the virtual 
processor will observe the appropriate exception, but not until after it is unparked. 


When an error steering register is written by software, the update becomes visible 
after an unspecified delay. If a store to the register is followed by a MEMBAR 
synchronization barrier instruction, it is guaranteed that the write to the error 
steering register will complete by the time the execution of the MEMBAR instruction 
completes. 


When a non-virtual-processor-specific error occurs, the corresponding error steering 
register is consulted. The error is reported to and an exception is generated in the 
virtual processor indicated by the error steering register. 


If a non-virtual-processor-specific error occurs and at the same time target id is 
being changed in the corresponding error steering register, the subsequent error 
report and the generated exception will occur together on the same virtual processor, 
either the virtual processor indicated by the old value in the error steering register or 
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the one indicated by the new value. That is, for non-virtual-processor-specific 
errors, the generation of an error report plus an exception is atomic with respect to 
changes to the contents of the error steering register. 


State of Error Steering Register After Reset. 


The target_id field of an error steering register is initialized during a power-on-reset 
and warm reset. After a power-on-reset, the value in the target_id field of an error 
steering register should refer to the lowest-numbered available virtual processor (as 
indicated by the STRAND AVAILABLE register) that corresponds to the resource(s) 
covered by the steering register. After a warm reset, the value in the target id field of 
an error steering register should refer to the lowest-numbered enabled virtual 
processor (as indicated by the STRAND ENABLE register) that corresponds to the 
resource(s) covered by the steering register. 


Error Steering Register (ERROR. STEERING). The ERROR STEERING 
register is the recommended mechanism for specifying which virtual processor in an 
error-handling group should handle non-virtual-processor-specific errors in 
resources shared by all virtual processors of the error-handling group. 

ERROR STEERING is a shared register, accessible from all virtual processors in the 
error-handling group. 


When a non-virtual-processor-specific error occurs, the error is recorded using the 
asynchronous error reporting mechanism in the virtual processor indicated by 
ERROR STEERING. The appropriate exception is generated in that same virtual 


processor. 
Register Name ASI # (Name) VA Scope Access Note 
ERROR STEERING per-proc RW 


The Error Steering register has only one field that encodes the strand ID of the 
strand that should be informed of non-virtual-processor-specific errors. When an 
error is detected that cannot be traced back to a specific virtual processor, the error 
is recorded in, and a trap is sent to, the virtual processor identified by the Error 
Steering register. 


ERROR STEERING Reserved target id 
63 n n-1 0 


FIGURE 15-9 ERROR STEERING Register 





IMPL. DEP. #435-S10: Although the ERROR STEERING register is the 
recommended mechanism for steering non-virtual-processor-specific errors to a 
virtual processor for handling, the actual mechanism used in a given 
implementation is implementation dependent. 
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The ERROR STEERING register contains one field, target id, that encodes the 
virtual processor ID of the virtual processor that should be informed of non-virtual- 
processor-specific errors (see FIGURE 15-9). 


IMPL. DEP. #436-S10: The width of the target id field of the ERROR STEERING 
register is implementation dependent. 


The target id field (refer to FIGURE 15-9) must be wide enough to encode the strand 

ID of the highest-numbered implemented virtual processor. If n bits of this field are 
implemented, the unused most-significant bits numbered 5 to 6-n read as zero and 

writes to those bits are ignored. 


IMPL. DEP. #437-S10: An implementation may provide multiple target id fields in 
an ERROR STEERING register for different types of non-virtual-processor-specific 
errors. 


15.6.2.2 Reporting Non-Virtual-Processor-Specific Errors 


Before an exception can be generated for a non-virtual-processor-specific error, the 
error must be recorded. Non-virtual-processor-specific errors are recorded using the 
asynchronous error reporting mechanism of the virtual processor specified by the 
ERROR STEERING register. The mechanism used is the same as that for reporting 
vitual processor-specific errors. 


Each asynchronous error is defined as either virtual-processor-specific or non- 
virtual-processor-specific. If the same error can occur as either a virtual-processor- 
specific error or a non-virtual-processor-specific error, the two cases must be 
reported as two identifiably distinct errors. 


IMPL. DEP. #438-S10: It is implementation dependent whether the error-reporting 
structures for errors in shared resources appear within a virtual processor in per- 
virtual-processor registers or are contained within shared registers associated with 
the shared structures in which the errors may occur. 


IMPL. DEP. #439-S10: The type of exception generated in a virtual processor to 
handle each type of non-virtual-processor-specific error is implementation 
dependent. A virtual processor can choose to use the same exceptions used for 
corresponding virtual-processor-specific asynchronous errors or it can choose to 
generate different exceptions. 
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15.7 


152541 


15.7.2 


15.7.3 


15.7.4 


Additional CMT Software Interfaces 


Diagnostic/RAS Registers 


The CMT software interface defines how virtual processors are disabled or parked 
(for diagnostic and error recovery) and how errors are reported in a CMT processor. 
It is up to the implementation to provide appropriate diagnostic and recovery 
mechanisms, which are not specified here. 


A future extension of the CMT programming model may include more common 
features for diagnostics and RAS. Increasing commonality without significantly 
limiting the implementation options is best. 


Configuration Registers 


Given the broad range of possible implementations, no common configuration 
interface is defined here. 


At this time the CMT programming model does not specify any common 
configuration registers. A future extension of the CMT programming model may 
include some. Increasing commonality without significantly limiting the 
implementation options is best. 


Performance Registers 


At this time, no common performance registers are specified. A future extension of 
the CMT programming model may include some. 


This is a specifically important area to have common features. A range of software 
tools rely on the performance registers and common features will enable software 
tools to be more quickly deployed on new architectures with less work. 


Booting Support 


Some of the registers previously described can be used by firmware for booting 
support. See Strand Running Register (STRAND_RUNNING) on page 545 for an 
example of such a register. 


During a power-on-reset, only one enabled virtual processor per processor will be 
unparked. Only this virtual processor will begin fetching instructions after the reset. 
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IMPL. DEP. #440-S10: Which virtual processor is unparked during POR and 
whether it is unparked by processor hardware or by a service processor is 
implementation dependent. Conventionally, the virtual processor with the lowest- 
numbered strand id is unparked. 


In a recommended booting sequence, software determines when virtual processors 
become unparked after reset. The default behavior is for only one virtual processor 
to be unparked when the system reset signal is removed. That virtual processor, in 
turn, configures common registers and then unparks other virtual processors one at 
a time. This is only one possible boot sequence; software is free to implement other 
boot sequences. 





15.8 


Performance Issues for CMT Processors 


Which resources are shared among which virtual processors in a CMT processor is 
implementation-dependent. Resources such as caches, TLBs, and even execution 
pipelines may be shared by virtual processors. From a performance perspective, 
there are significant issues that result from this sharing. In this section, 
hyperprivileged software issues of thread scheduling and configuration of inactive 
virtual processors is discussed. Issues of how to develop algorithms and approaches 
to take advantage of the low communication latencies between virtual processors are 
not covered here. 


To understand and take advantage of performance issues in a CMT processor 
requires some knowledge of the underlying implementation. The existence of 
implementation dependencies is unavoidable, but hopefully abstract representations 
and general approaches can reduce the degree of implementation dependence in 
hyperprivileged software. 





15.9 


Recommended Subset for Single-Strand 
Processors 


It is recommended that single-strand UltraSPARC Architecture 2005 processors 
implement a subset of the CMT interface. This enables them to more easily integrate 
into systems that may also contain CMT processors and enables more consistent 
software to be deployed across those and other future systems. 


Single-strand UltraSPARC Architecture 2005 processors should implement all of the 
CMT registers described in this chapter, as follows: 


CHAPTER 15 + Chip-Level Multithreading (CMT) 559 


m The Strand Interrupt ID register (STRAND_INTR_ID) should be fully 
implemented. 


m All other registers can be implemented as read-only registers containing fixed 
values, writes to which are ignored. 


TABLE 15-1 summarizes the recommended implementation of CMT registers for a 
single-strand processor implementation: 


TABLE 15-1 Recommended CMT Register Set for Single-Strand Processors 











ASI VA Register Name Type Note 
41g 0046 STRAND AVAILABLE R only Read value of 0146 
1016 STRAND ENABLE STATUS R only Read value of 01416 
2016 STRAND ENABLE R only Read value of 0116 
3016 XIR STEERING R only Read value of 016 
5016 STRAND RUNNING RW R only Read value of 0146 
5846 STRAND RUNNING STATUS Ronly Read value of 016 
6016 STRAND RUNNING W1S W only (ignored) Access (write) ignored 
6816 STRAND RUNNING W1C W only (ignored) Access (write) ignored 
6316 0016 STRAND INTR ID RW Software assigned unique interrupt ID 
for virtual processor (read/write) 
1046 STRAND ID R only Read value of 0046 
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15.10 


Machine State Summary 


TABLE 15-2 describes the ASI extensions that support CMT registers. The states of 
CMT registers after resets are enumerated in TABLE 16-2 on page 570. 











TABLE 15-2 ASI Extensions 

ASI VA Register Name Scope Type Description 

4116 0046 STRAND_AVAILABLE per-proc R Bit mask of implemented virtual 
processors 

1016 STRAND ENABLE STATUS per-proc R Bit mask of enabled virtual processors 

2016 STRAND ENABLE per-proc RW Bit mask of virtual processors to enable 
after next reset (read/write) 

3016 XIR_STEERING per-proc RW Bit mask of virtual processors to 
propagate XIR to (read/write) 

5016 STRAND RUNNING RW per-proc RW Bit mask to control which virtual 
processors are active and which are 
parked (read/write): 1= active, 0 = 
parked 

5816 STRAND RUNNING STATUS per-proc R Bit mask of virtual processors that are 
currently active: 1 = active, 0 = parked 

6016 STRAND RUNNING W1S per-proc WI1S Pseudo-register for write-one-to-set 
access to STRAND RUNNING 

6816 STRAND RUNNING W1C per-proc W1C Pseudo-register for write-one-to-clear 
access to STRAND RUNNING 

6316 0016 STRAND_INTR_ID per- RW Software assigned unique interrupt ID 
strand for virtual processor (read/write) 

1016 STRAND_ID per- R Hardware assigned ID for virtual 

strand processor (read-only) 

401, and Reserved per- Impl. Reserved for implementation-specific 

greater strand Dep. per-strand registers 
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CHAPTER 1 6 


Resets 








16.1 


Resets 


The UltraSPARC Architecture 2005 defines 5 types of resets. Reset priorities, listed 
in order from highest to lowest, are as follows: 


m power-on reset (POR) 

m warm reset (WMR) 

m externally initiated reset (XIR) 
m watchdog reset (WDR), and 


m software-initiated reset (SIR) 


POR, WMR, and XIR resets are initiated external to the processor (chip). WDR and 
SIR resets are initiated by the virtual processor itself, in response to specific 
conditions. 


POR resets are processor-wide (affect all virtual processors on the chip). WDR and 
SIR resets are directed to a specific virtual processor. XIR resets are directed to the 
virtual processor(s) indicated by the XQIR. STEERING register. WMR resets are 
implementation dependent and may be either processor-wide or directed to specific 
virtual processor(s). 


Resets are used to initialize a virtual processor and place it in an operating state, to 
attempt recovery of a failing or stuck virtual processor, to attempt recovery of failing 
operating system privileged software, and for debug purposes. The defined states 
for each reset show an increasing amount of resource reset, such that, for example, a 
XIR, WDR or SIR reset will leave most architectural and memory resources 
unchanged, while a WMR reset will leave most memory resources unchanged but 
reset certain architectural resources, and a POR reset will initialize all processor 
resources. 
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All resets are processed as traps and place the virtual processor in RED state. 

RED, state (Reset, Error, and Debug state) is a restricted execution state reserved 
for processing hardware- and software-initiated resets. Please refer to Reset Traps on 
page 466 and the subsections regarding reset traps in RED state Trap Processing, 
which begins on page 490. 





16.1.1 Power-on Reset (POR) 


A POR reset occurs when the assigned POR pin is asserted and deasserted. During 
this time, all other resets and traps are ignored. POR reset has the highest trap 
priority. POR causes any pending external transactions to be cancelled. 


The POR reset is a processor-wide reset. It affects all virtual processors on the chip, 
as well as all IO, cache, and DRAM subsystems. 


During a POR reset, hardware sets registers to a known state (see Machine States on 
page 566). All hardware-based initialization functions are performed, all logic 
(including the pipeline) is initialized, all architectural registers are placed in their 
reset state (as defined in TABLE 16-1 on page 567), and all entries in caches and TLBs 
are invalidated. 


A service processor may also participate in the POR reset process. The POR reset 
functions provided by a service processor are documented in the relevant Service 
Processor specification. 


After a POR reset is complete: 


m The first available! virtual processor begins executing at physical address 
RSTVADDR + 2016 (RED, state trap vector base address plus the POR offset of 
20,6), with a trap type of 0146. 





m All other virtual processors are in "parked" state (see Parking and Unparking 
Virtual Processors on page 544) 


Implementation | From the perspective of this specification, which describes a 

Note | processor architecture, after a Power-On Reset (POR) execution 
begins on one strand of the processor. However, in a 
multiprocessor system, after POR a service processor might 
arrange for execution to initially occur on only one strand per 
system. If and how that that occurs is beyond the scope of this 
specification and would be described in system-level 
documentation. 





1 per the Strand Available register (see Strand Available Register (STRAND_AVAILABLE) on page 541) 
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16.1.2 


16.1.3 


Programming | After a POR reset, software must initialize values that are 

Note | specified as “undefined” in TABLE 16-1. In particular, I-cache 
tags, D-cache tags and L2 cache tags must be initialized before 
enabling the caches. The ITLB, DTLB and UTLB also must be 
initialized before enabling memory management. If a service 
processor participates in the reset, software should also 
reference the Service Processor Specification to determine which 
machine state has been reset by the service processor. 





Warm Reset (WMR) 


A Warm Reset (WMR) occurs when software writes into a particular 
implementation-dependent reset register or when an implementation-dependent 
reset input pin is asserted and then deasserted. When a WMR reset is received, all 
other resets and traps except POR are ignored. 


The extent to which the processor is reset by a WMR reset is implementation 
dependent. A WMR reset may be chip-wide or it may be core-wide (reseting all 
virtual processors on the core, but allowing virtual processors on other cores to 
continue processing and maintaining cache coherency). 


A WMR, even if it is chip-wide, will not alter the contents of external memory. It 
may, however, alter on-chip portions of the memory system (for example, store 
queues or cache(s)). 


Warm reset has the same trap type (116) and trap vector offset (2016) as a POR reset. 
By what means hyperprivileged software can distinguish between WMR and POR 
resets is implementation dependent. 


IMPL. DEP. £420-S10: The following aspects of Warm Reset (WMR) are 
implemenation dependent: 

(a) by what means WMR can be applied (for example, write to reset register or 
assertion /deassertion of an input pin) 

(b) the extent to which a processor is reset by WMR (for example, single physical 
core, entire processor (chip), and how the on-chip memory system is affected), 
(c) by what means hyperprivileged software can distinguish between WMR and 
POR resets 


Externally Initiated Reset (XIR) 


An externally initiated reset (XIR) is sent by asserting and deasserting an input pin, 
setting and clearing a bit in a reset register, or both. 


An XIR reset is sent to all virtual processors specified in the XIR. STEERING register 
(impl. dep. #304-U4-Cs10). It causes an XIR reset trap in each affected virtual 
processor. An XIR reset trap has trap type 346 and uses a trap vector with a physical 
address offset of 6016- 
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Memory state, cache state, and most architectural state (see TABLE 16-1) are 
unchanged by an XIR reset. System coherency is guaranteed to be maintained 
during an XIR reset. The PC (NPC) saved in TPC (TNPC) observed after an XIR will 
be mutually consistent, such that execution could resume using the saved PC and 
NPC. In effect, XIR behaves like a non-maskable interrupt. 


16.1.4 Watchdog Reset (WDR) 


An UItraSPARC Architecture virtual processor enters error state when a non- 
reset trap or SIR reset occurs at TL = MAXTL. 


The virtual processor signals itself internally to take a watchdog reset (WDR) and 
sets TT to the trap type of the trap that caused entry to error state. The WDR 
causes a trap using a trap vector with a physical address offset of 40464. WDR only 
affects the virtual processor on which it occurs; no other virtual processors are 
affected. 


On a watchdog reset trap caused by a register window-related trap, CWP register is 
updated the same as if a WDR had not occurred. 


16.1.5 Software-Initiated Reset (SIR) 


A software-initiated reset is initiated by an SIR instruction executing on a virtual 
processor. This virtual processor reset has a trap type 4 and uses a trap vector with a 
physical address offset of 8016. SIR affects only the virtual processor on which it 
executes; all other virtual processors are unaffected. 





16.2 Machine States 


Machine state changes when a trap is taken at TL = MAXTL — 1 or when a reset occurs. 


TABLE 16-1 specifies the machine states observed by software after a trap is taken at 
TL = MAXTL — 1 or after a reset occurs. For details of how those machine states are 
set, see processor-specific documentation and/or relevant Service Processor 
documentation. 


The UItraSPARC Architecture specifies the machine state that must be obeserved by 
software after reset. 


Implemenatation | For the POR reset (and possibly the WMR reset), the change in 
Note | machine state may be accomplished directly by processor 
hardware or with support from a service processor. 
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Programming | Virtual processor states are only updated according to TABLE 16-1 


Note 





if RED. state is entered because of a trap at TL = MAXTL — 1 or a 
reset. If RED. state is entered because the HPSTATE.red bit 
was explicitly set to 1 by software, then software is responsible 
for setting the appropriate machine state. 





In the following tables, a value marked as "Undefined" may or may not be set to a 
known value by hardware (and/or a service processor) after reset. 


Programming 
Note 





Values marked as "Undefined" after POR in the following tables 
should be initialized by software after the power-on reset. 


TABLE 16-1 Machine State After Reset or a Trap @ TL = MAXTL—1 (1 of 3) 



























































Fields raps taken 

Name POR WMR WDR XIR SIR @TL=MAXTL-1 

DEBET registers l Unde meg Unchanged 

Floating-point registers Undefined 

VA = FFFF FFFF F000 000046 

RSTVADDR PA = 0000 7FFF F000 000046 

(impl. dep. #114) 

PC RSTVADDR | 2014 la Fe T | 

NPC RSTVADDR | 24, und | T | PO | 
tct 0 (Trap on control transfer disabled) 

PSTATE mm 
pe 
am 
B] ————— 9] 
s 
de 
tle 0 (Trap little-endian) Unchanged 
ibe 0 (Instruction breakpoint disabled) 

HPSTATE Ld 1 (REDE RESTE) 
hpriv 1 (Hyperprivileged mode) 
tlz 0 (trap level zero traps disabled) 

TBA<63:15> tba high49 riis rius 

HTBA-SS: 4s ba. igneo 

Y 

P 

CWP 

mm MT [3 ee] 2 | + | 

CCR Undefined Unchanged 

ASI Undefined Unchanged 
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TABLE 16-1 Machine State After Reset or a Trap @ TL = MAXTL— 1 (2 of 3) 
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Name Fiet WMR DTL=MAXTL-1 
TL MAXTL min(TL+1, MAXTL) 
GL MAXGL min(GL+1, MAXGL) 
TPCITL] Undefined PT PC 
TNPCITL] Undefined pred NPC 
gi GL 
ccr f CCR 
TSTATE[TL] asi Undefined nent ASI 
pstate PSTATE 
cwp CWP 
o 
= aaa pido 
ee Bos — | Utne | ssi 
te 
se Unchanged 
thal counter 
CANSAVE ndefined Unchanged 
CANRESTORE ndefined Unchanged 
OTHERWIN ndefined Unchanged 
CLEANWIN ndefined Unchanged 
WSTATE other ndefined Unchanged 
normal 
mani 
mp 
HVER mask 
mad 
mast 
max 
FSR all Undefined Unchanged 
generaGSR all Undefined Unchanged 
fef 
FPRS du Undefined nchanged 
dl 
SOFTINT ndefined Unchanged 
HINT rap 
TICK_CMPR jnt dis — Undefined Unchanged 
tick_cmpr 
STICK CES. aeea eas 
counter Count 
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TABLE 16-1 Machine State After Reset or a Trap @ TL = MAXTL — 1 (3 of 3) 
Name Fields POR XIR @TL=MAXTL-1 
STICK_CMPR int_dis 1 Unchanged 

stick_cmpr ndefined Unchanged 

int_dis 1 Unchanged 
HSTICK_CMPR = 

STICKS hstick_cmpr|| Undefined Unchanged 
SCRATCHPAD n Undefined Unchanged 
HYP SCRATCHPAD 7n Undefinedt |Unchangedt [Unchanged Unchanged 
| SFSR, D SFSR Undefined Unchanged 
D SFAR Undefined Unchanged 
I-cache controls enable(s) 0 (disable I$) Unchanged 
I-cache entries Invalidated Unchanged 
I-cache data Undefined Unchanged 
D-cache controls enable(s) 0 (disable D$) Unchanged 
D-cache entries Invalidated Unchanged 
D-cache data Undefined Unchanged 
0 (disable 

MMU controls /Demap — |enable(s) MMU) Unchanged 
MMU registers all Undefined Unchanged 
ITLB/DTLB/UTLB entries Invalidated Unchanged 
store queue entries Invalidated Unchanged 
L2 cache controls eanble(s) 1 (enable L2$) Unchanged 


L2 cache entries 





Invalidated Unchanged 






































L2 cache directory Invalidated Unchanged 
L2 cache data Undefined Unchanged 
Error enable registers Undefined Unchanged 
Error Trap enable Undefined Unchanged 
registers 

Error Status registers error events|| Undefined? Unchanged 
Watchpoint Controls enables 0 (disabled) Unchanged 
Interrupt Queue pointers Jall Undefined Unchanged 
Error Queue pointers all Undefined Unchanged 





t Ifa service processor is present, it may change the value of hyperprivileged scratchpad register(s) before execution of the reset trap handler 


begins 


t IMPL. DEP. #419-S10: It is implementation dependent whether, after a Warm Reset (WMR), the contents of TPC[TL], TNPC[TL], 
TSTATE[TL], and HTSTATE[TL] are unchanged from their values before the WMR, or contain the same values saved as during a WDR, 
XIR, or SIR reset. (The latter implementation is the preferred one.) 


1. After POR, integer register R[0] must read as zero (with good ECC/parity). The value to which all other integer and all floating- 
point registers are set during a Power-On Reset (POR) is Undefined. For an implementation that protect these registers with ECC/ 
parity, the registers must be initialized with good ECC/parity as part of a POR reset, either by hardware or software. 


2. For a POR reset, the Error Status register(s) can be set either by hardware or by a service processor. 
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Machines States for CMT 


TABLE 16-2 shows the CMT machine state set by hardware as a result of a trap taken 
at TL = MAXTL — 1 or when a reset occurs. 


16.2.1 





Machine State After Reset and in RED state for CMT Registers 


TABLE 16-2 











Name Fields POR WMR WDR| XIR| SIR| vAxTL-1 
Registers shared among Virtual Processors (Strands) 
STRAND AVAILABL Unchanged 


E (Predefined value, set at time of manufacture) 





STRAND ENABLE 
STATUS 





STRAND ENABLE 
XIR. STEERING 


STRAND RUNNING 
and 

STRAND RUNNING 

_ STATUS 











Copied from 

STRAND AVAILABLET 
(but may be changed by a 
service processor during 
reset) 


Copied from 

STRAND AVAILABLET 
(impl. dep. #325-U4(b)) 
(but may be changed by a 
service processor during 
reset) 


Set to Ot, then during the 
reset either 

(1) virtual processor 
hardware sets to 1 the bit 
in the position 
corresponding to lowest- 
numbered implemented and 
available virtual processor 
(as specified by 

STRAND AVAILABLE), or 
(2) this register is 
initialized by a service 
processor. 
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Copied from 

STRAND ENABLET 

(but may be changed by a 
service processor during 
reset) 


Unchanged 





Unchanged 


Copied from Unchanged 
STRAND ENABLET 

(but may be changed by a 

service processor during 


reset) 


Set to Ot, then during the 
reset either 

(1) virtual processor 
hardware sets to 1 the bit 
in the position 
corresponding to lowest- 
numbered enabled virtual 
processor (as specified by 
the value of 

STRAND ENABLE before 
the reset), or 

(2) this register is 
initialized by a service 
processor. 


Unchanged 





TABLE 16-2 Machine State After Reset and in RI 


Name 


Fields 


POR 





WMR 


ED state for CMT Registers (Continued) 





Traps 
taken 
@TL= 
WDR| XIR| SIR| vAxTL-1 





Per-Strand Registers (not shared) 























STRAND ID max stra hastand iDa 
nd_id 
maxicore max core ID t 
_id Unchanged 
core_id core ID t of this core 
SORESINTREIR LE interrupt ID t of this core 








t if the implementation is always paired with a service processor and the service processor always initializes 
this register during reset, processor hardware can leave this register unchanged (or set it to 0) and allow 
the service processor to perform the initialization 
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APPENDIX A 





Opcode Maps 


This appendix contains the UltraSPARC Architecture 2005 instruction opcode maps. 


In this appendix and in Chapter 7, Instructions, certain opcodes are marked with 
mnemonic superscripts. These superscripts and their meanings are defined in 
TABLE 7-1 on page 138. For preferred substitute instructions for deprecated opcodes, 
see the individual opcodes in Chapter 7 that are labeled “Deprecated”. 


In the tables in this appendix, reserved (—) and shaded entries (as defined below) 
indicate opcodes that are not implemented in UltraSPARC Architecture 2005 strands. 


| [An attempt to execute opcode will cause an illegal instruction exception. 


An attempt to execute opcode will cause an fp. exception other exception with 
FSR.ftt = 3 (unimplemented FPop). 








An attempt to execute a reserved opcode behaves as defined in Reserved Opcodes and 
Instruction Fields on page 134. 


TABLE A-1 Opí1:0] 











op (1:0) 
0 1 2 3 
Branches and SETHI CALL Arithmetic & Miscellaneous Loads/Stores 
(See TABLE A-2) (See TABLE A-3) (See TABLE A-4) 





TABLE A-2  Op2{2:0} (op = 0) 

















op2 (2:0) 
0 1 2 3 4 5 6 7 
ILLTRAP  |BPcc (See Bicc™ (See BPr (bit 28 = 0) SETHI  |FBPfcc (See FBfccP (See — 
TABLE A-7) |TABLE A-7) (See TABLE A-8) TABLE A-7) TABLE A-7) 
— (bit28-1) ^ |NOP? 











1. See the footnote regarding bit 28 on page 162. 
2. rd = 0, imm22 - 0 
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TABLE A-3 


0p3{5:0} (op = 105) (1 of 2) 





op3{5:4} 





0 1 2 





op3 
{3:0} 


0 


DIN 


ADD ADDcc TADDcc 


AND ANDcc TSUBcc 


oR forse — [poem 


XOR | |XORec | TSUBccTV" 


Bo pons — usc 


SON ANDNcc SLL (x = 0), SLLX (x = 1) 


WRYP (rd 2 0) 

— (rd=1) 

WRCCR (rd = 2 

WRASI (rd = 3) 

— (rd= 4,5) 

SIRE (rd = 15, rs1 = 0, i - 1) 
— (rd = 15) and (rs1 #0 or i #1)) 
— (rd-7- 14) 

WRFPRS (rd - 6) 
WRasrP^5R (7 < rg < 14) 
WRPCRP (rd = 16) 

WRPIC (rd = 17) 

— (rd = 18) 

WRGSR (rd = 19) 
WRSOFTINT SET? (rd = 20) 
WRSOFTINT CLRP (rd = 21) 
WRSOFTINTP (rd = 22) 
WRTICK CMPRP (rd = 23) 
WRSTICKH (rd = 24) 
WRSTICK CMPRP (rd = 25) 
— (rd = 26 - 31) 

SAVED (fcn = 0) 
RESTORED? (fcn = 1) 
ALLCLEAN? (fcn = 2) 
OTHERW? (fcn = 3) 
NORMALWP (fcn = 4) 
INVALWP (fcn = 5) 

— (fcn 2 6) 


WRPR (rd = 0-14 or 16) 

— (rd = 15 or 17-31) 
RHPRP 

FPop1 (See TABLE A-5) 

FPop2 (See TABLE A-6) 





ORN ORNcc SRL (x = 0), SRLX (x = 1) 


IMPDEPI (VIS) (See TABLE A-12) 











NN O0! B| 0 





XNOR XNORcc SRA (x = 0), SRAX (x = 1) 
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IMPDEP2 





TABLE A-3 0p3([5:0] (op = 105) (2 of 2) 











op3{5:4} 
0 1 2 3 
8 |ADDC ADDCcc RDYP (rst 20, i- 0) JMPL 
— (rsi =1,i=0) 


RDCCR (rs1= 2, i=0) 
RDASI (rs1 = 3,i=0) 
RDTICK"?t (rs1 = 4, i =0) 
RDPC (rs1 = 5,i=0) 
RDFPRS (rs1 = 6, i =0) 
RDasrPAs® (7 < rd < 14, i = 0) 
MEMBAR (rs1 = 15, rd = 0, i=1, 
instruction bit 12 = 0) 
— (rsi =15, rd = 0, i=1, 
instruction bit 12 = 1) 

— (i = 1, (rs1 #15 or rd #0)) 

(rs = 15, rd = 0, i= 0) 
— (rs1 = 15 and rd > 0 and i = 0) 
RDPCR? (rs1 = 16 and i = 0) 
RDPIC (rs1 = 17 and i = 0) 
— (rs1 =18 and i = 0) 
RDGSR (rs1 = 19 and i = 0) 
— (rs1 = 20 or 21) and (i = 0)) 
RDSOFTINT? (rs1 = 22 and i = 0) 
op3 RDTICK_CMPR? (rs1 = 23 and i = 0) 
































(3:0) RDSTICK (rs! = 24 and i = 0) 
RDSTICK_CMPR? 
(rs1 = 25 and i= 0) 
— ((rs1 = 26 — 31) and (i = 0) 
9 [MULX = RDHPRH RETURN 
A (rs1 2 1-14 or 16) Tcc ((i 2 0 and inst{10:5} = 0) or 
(i = 1) and (inst{10:8} = 0))) 
(See TABLE A-7) 
— (rs1 = 15 or 17 - 31) — (bit 29 = 1) 
— ((i 2 0 and (inst{10:5} + 0)) or 
(i =1 and (inst{10:8} + 0)) 
B SMULccP FLUSHW FLUSH 
C SUBCcc MOVcc SAVE 
D = SDIVX RESTORE 
E UDIVec? POPC (rs1 = 0) DONE? (fen = 0) 
— (rs1 > 0) RETRY? (fcn = 1) 
— (fen = 2..15) 
— (fen = 16..31) 
F MOVr (See TABLE A-8) = 
op3 
{3:0} 
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TABLE A-4 


0p3/5:0] (op = 115) 


op3{5:4} 
eve lt me vee se =: 3 
mowa [LDF LDFAT™S! 
LDUBAPASI (rd = 0) LDFSRP E 
(rd = 1) LDXFSR 


LDUHAPASI LDQFAPASI 





LDTWAD. PASI 
LDTXA 
— (rd odd) 


LDDFAPASI 
LDBLOCKF 
LDSHORTF 





STWAPASI STFAPASI 





STFSRD, STXESR = 





























STQEATA 
STTWP ope STDF STDFAPASI 
— (rd odd) STLBLOCKF 
STPARTIALF 
STSHORTF 
Rest 
LDSB Reserved Reserved 
LDSH Reserved Reserved 
LDX Reserved Reserved 
Reserved Reserved — CASAPASI 
LDSTUB LDSTUBAPAST PREFETCH PREFETCHAPASI 
— (fcn = 5 — 15) — (fcn = 5 — 15) 
En SRR cas 
RETE 
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TABLE A-5 


opf{8:0} (op = 105,0p3 = 3416 = FPop1) 


opf{3:0} 





opf{8:4} 
0016 


FMOVs 


FMOVd 


3 
FMOVq 


4 





0146 





0246 
0316 





0516 








0616 





0716 
0816 








0916 
0A46 





OB1g 
OD56 














OE46-1F4g 


0016 





FABSs 


FABSd 





0216 
0316 





FSORTs 


FSORTd 








0446 
0516 


FMULs 


FMULd 





FDIVd 








0616 
0716 


FsMULd 


FdMULq 





0816 
0916 
0A46 





0B16 
0C16 











0D16 
0E16-1F16 
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TABLE A-6 _Opf{8:0} (op = 105, op3 = 3516 = FPop2) 


opf{3:0} 





opf(8:4) | 0 1 


FMOVa (fcc0) 


4 


5 6 7 


ti 











FMOVRsZt |FMOVRdZł |FMOVRgZ t 





FMOVAa (fcc1) 





FCMPq 








FMOVq (fcc2) 














FMOVa (fcc3) 











FMOVgq (icc) 





FMOVq (xcc) 








* Reserved variation of FMOVR 








f bit 13 of instruction = 0 
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TABLE A-7 cond{3:0} 
FBPfcc Tcc 
op=0 op=2 
op2=5 Op3 = 3a{6 
0 | BPN BN" FBPN FBN” TN 
3 | BPL BLY FBPUL FBUL” TL 
4 | BPLEU BLEU? FBPL FBL? TLEU 
6 | BPNEG BNEGP FBPG FBG" TNEG 
cond | 7 | BPVS BvsP FBPU FBUP TVS 
9 | BPNE BNE” FBPE FBEP TNE 
A | BPG BGP FBPUE FBUEP TG 
C | BPGU BGU" FBPUGE FBUGEP TGU 
D | BPCC BCCP FBPLE FBLEP TCC 
F | BPVC BVC? FBPO FBO” TVC 
TABLE A-8 Encoding of rcond{2:0} Instruction Field 
| Br | MOV |  FMOW | 
op =0 op =2 op =2 
op2 = 3 op3 = 2F{6 Op3 = 3546 
0 =? Es = 
| 1 BRZ |MOVRZ  [FMOVResidiq>Z | 
| 2 [BRLEZ  |[MOVRLEZ |FMOVResldlq»LEZ | 
rcond 3 |BRLZ MOVRLZ FMOVR-sdlq»LZ 
{20} [== ns es 
| 5 |BRNZ  |MOVRNZ [FMOVResidiIq>NZ | 
6 |BRGZ MOVRGZ FMOVR«s|dlq»GZ 
| 7 |BRGEZ |MOVRGEZ |FMOVResldlq»GEZ | 
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TABLE A9 cc / opf cc Fields (MOVcc and FMOVcc) 


opf cc Condition Code 
cc2 | cci | ccO Selected 
































TABLE A-10 cc Fields (FBPfcc, FCMP, and FCMPE) 








cci | cco Condition Code 
Selected 
0 0 fccO 
0 1 fcc1 
1 0 fcc2 
1 1 fcc3 














TABLE A-11 cc Fields (BPcc and Tec) 








ce | ecô Condition Code 
Selected 
0 0 icc 
0 1 — 
1 0 XCC 
1 1 — 
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TABLE A-12 IMPDEP1: opf{8:0} for VIS opcodes (op = 105, op3 = 3616) 








opf {8:4} 





01 03 
ARRAYS 


ARRAYS 


05 


06 


FZERO 


FZERO 


08 





FZEROS 





ARRAY16 


FNOR 
FNORS 








FXNOR 
FXNORS 





ARRAY32 FCMPLE32 


FMUL 
8x16AL 


FPSUB32 


FANDNOT2 


FSRC1 


FORNOT2 





FPSUB32S 


FORNOT2S 





ALIGN 
ADDRESS 








ALIGN 


6 
7 
8 
FMULD 
8ULx16 
FCMPEQ16 |FPACK32 
ADDRESS 
LITTLE 


FPMERGE 


FANDNOT1 


FN 





FSRC2 


FANDNOTIS|FSRC2S 








B |EDGE32LN — — FPACK16 
C — = FCMPGT32 — 


BSHUFFLE 
FEXPAND 


FORNOTIS 
FOR 











FPACKFIX 
FCMPEQ22 |PDIST 
F A. m 








FNANDS 





FORS 
FONE 


FONES 
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TABLE A-14 IMPDEP1: opf{8:0} for VIS opcodes (op = 105, op3 = 3616) (3 of 3) 


opf {8:4} 
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APPENDIX B 


d E aV A IE IM uS AO OO AO 8 A ggg. 8 uo o i... 


/ Note: This chapter is undergoing final review; please check ; 
f back later for a copy of UltraSPARC Architecture / 
f 
7 





y 2005 containing the final version of this chapter. 


É oppppE,E,FREFEFEFEFEEEEEEEEEEEFEFEFEFEFEFEFEFEFFFFFFFFFFFFF 


Implementation Dependencies 





This appendix summarizes implementation dependencies in the SPARC V9 
standard. In SPARC V9, the notation “IMPL. DEP. £nn:" identifies the definition of 
an implementation dependency; the notation "(impl. dep. fr)" identifies a reference 
to an implementation dependency. These dependencies are described by their 
number nn in TABLE B-1 on page 587. 


The appendix contains these sections: 


Definition of an Implementation Dependency on page 585. 
Hardware Characteristics on page 586. 
Implementation Dependency Categories on page 586. 


L| 
L| 
L| 
m List of Implementation Dependencies on page 587. 





B.1 Definition of an Implementation 
Dependency 


The SPARC V9 architecture is a model that specifies unambiguously the behavior 
observed by software on SPARC V9 systems. Therefore, it does not necessarily 
describe the operation of the hardware of any actual implementation. 


An implementation is not required to execute every instruction in hardware. An 
attempt to execute a SPARC V9 instruction that is not implemented in hardware 
generates a trap. Whether an instruction is implemented directly by hardware, 
simulated by software, or emulated by firmware is implementation dependent. 
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The two levels of SPARC V9 compliance are described in UltraSPARC Architecture 
2005 Compliance with SPARC V9 Architecture on page 25. 


Some elements of the architecture are defined to be implementation dependent. 
These elements include certain registers and operations that may vary from 
implementation to implementation; they are explicitly identified as such in this 
appendix. 


Implementation elements (such as instructions or registers) that appear in an 
implementation but are not defined in this document (or its updates) are not 
considered to be SPARC V9 elements of that implementation. 





B.2 


Hardware Characteristics 


Hardware characteristics that do not affect the behavior observed by software on 
SPARC V9 systems are not considered architectural implementation dependencies. A 
hardware characteristic may be relevant to the user system design (for example, the 
speed of execution of an instruction) or may be transparent to the user (for example, 
the method used for achieving cache consistency). The SPARC International 
document, Implementation Characteristics of Current SPARC V9-based Products, Revision 
9.x, provides a useful list of these hardware characteristics, along with the list of 
implementation-dependent design features of SPARC V9-compliant 
implementations. 


In general, hardware characteristics deal with 
m Instruction execution speed 
m Whether instructions are implemented in hardware 


m The nature and degree of concurrency of the various hardware units constituting 
a SPARC V9 implementation 





B.3 


Implementation Dependency Categories 


Many of the implementation dependencies can be grouped into four categories, 

abbreviated by their first letters throughout this appendix: 

m Value (v) 
The semantics of an architectural feature are well defined, except that a value 
associated with the feature may differ across implementations. A typical example 
is the number of implemented register windows (impl. dep. #2-V8). 
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m Assigned Value (a) 
The semantics of an architectural feature are well defined, except that a value 
associated with the feature may differ across implementations and the actual 
value is assigned by SPARC International. Typical examples are the impl field of 
the Version register (VER) (impl. dep. #13-V8) and the FSR.ver field (impl. dep. 
#19-V8). 

m Functional Choice (f) 
The SPARC V9 architecture allows implementors to choose among several 
possible semantics related to an architectural function. A typical example is the 
treatment of a catastrophic error exception, which may cause either a deferred or 
a disrupting trap (impl. dep. #31-V8-Cs10). 

m Total Unit (t) 
The existence of the architectural unit or function is recognized, but details are 
left to each implementation. Examples include the handling of I/O registers 
(impl. dep. #7-V8) and some alternate address spaces (impl. dep. #29-V8). 





B.4 


TABLE B-1 


List of Implementation Dependencies 


TABLE B-1 provides a complete list of the SPARC V9 implementation dependencies. 
The Page column lists the page for the context in which the dependency is defined; 
bold face indicates the main page on which the implementation dependency is 
described. 


SPARC V9 Implementation Dependencies (1 of 10) 





Nbr Category Description Page 


1-V8 f 


2-8 v 


3-V8 f 


4,5 


Software emulation of instructions 25 
Whether an instruction complies with UItraSPARC Architecture 2005 by being 
implemented directly by hardware, simulated by software, or emulated by firmware is 
implementation dependent. 


Number of IU registers 26, 51 
An UItraSPARC Architecture implementation may contain from 72 to 640 general- 

purpose 64-bit R registers. This corresponds to a grouping of the registers into 

MAXGL + 1 sets of global R registers plus a circular stack of N REG WINDOWS sets of 16 
registers each, known as register windows. The number of register windows present 

(N REG WINDOWS) is implementation dependent, within the range of 3 to 32 

(inclusive). 


Incorrect IEEE Std 754-1985 results 134 
An implementation may indicate that a floating-point instruction did not produce a 
correct IEEE Std 754-1985 result by generating an fp exception other exception with 
FSR.ftt = unfinished FPop or FSR.ftt = unimplemented FPop. In this case, software 
running in a higher privilege mode shall emulate any functionality not present in the 
hardware. 


Reserved. 
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TABLE B-1 SPARC V9 Implementation Dependencies (2 of 10) 





Nbr Category Description 


6-V8 f 


10-V8-12-V8 
13-V8 a 


14-V8-15-V8 
16-V8-Cu3 
17-V8 


18- f 
V8- 
Ms10 


19-V8 a 


20-V8-21-V8 
22-V8 f 


23-V8 
24-V8 
25-V8 f 


26-V8-28-V8 


I/O registers privileged status 
Whether I/O registers can be accessed by nonprivileged code is implementation 


dependent. 


I/O register definitions 
The contents and addresses of I/O registers are implementation dependent. 


RDasr/WRasr target registers 

Ancillary state registers (ASRs) in the range 0-27 that are not defined in UltraSPARC 
Architecture 2005 are reserved for future architectural use. ASRs in the range 28-31 are 
available to be used for implementation-dependent purposes. 


RDasr/WRasr privileged status 
The privilege level required to execute each of the implementation-dependent read / 
write ancillary state register instructions (for ASRs 28-31) is implementation 


dependent. 
Reserved. 


HVER.impl 


HVER.impl uniquely identifies an implementation or class of software-compatible 
implementations of the architecture. Values FFF0,6-FFFF; are reserved and are not 
available for assignment. 


Reserved. 
Reserved. 


Reserved. 


Nonstandard IEEE 754-1985 results 
UItraSPARC Architecture 2005 implementations do not implement a nonstandard 
floating-point mode. FSR.ns is a reserved bit; it always reads as 0 and writes to it are 


ignored. 


FPU version, FSR.ver 
Bits 19:17 of the FSR, FSR.ver, identify one or more implementations of the FPU 


architecture. 


Reserved. 


FPU tem, cexc, and aexc 
An UltraSPARC Architecture implementation implements the tem, cexc, and aexc 
fields in hardware, conformant to IEEE Std 754-1985. 


Reserved. 


Reserved. 


RDPR of FQ with nonexistent FQ 

An UltraSPARC Architecture implementation does not contain a floating-point queue 
(FQ). Therefore, FSR.ftt = 4 (sequence error) does not occur, and an attempt to read 
the FQ with the RDPR instruction causes an illegal instruction exception. 


Reserved. 


Page 
29 


29 


31, 70, 


303, 377 


31, 70, 
303, 377 


109 


63, 390 


63 


70 


66, 308 
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TABLE B-1 SPARC V9 Implementation Dependencies (3 of 10) 





Nbr Category Description Page 


29-V8 t Address space identifier (ASI) definitions 123 


In SPARC V9, many ASIs were defined to be implementation dependent. Some of 
those ASIs have been allocated for standard uses in the UltraSPARC Architecture. 
Others remain implementation dependent in the UltraSPARC Architecture. See ASI 
Assignments on page 422 and Block Load and Store ASIs on page 443 for details. 


30- f ASI address decoding 123 
V8- In SPARC V9, an implementation could choose to decode only a subset of the 8-bit ASI 
Cu3 specifier. In UltraSPARC Architecture implementations, all 8 bits of each ASI specifier 


must be decoded. Refer to Chapter 10, Address Space Identifiers (ASIs), of this 
specification for details. 


31- f This implementation dependency is no longer used in the UltraSPARC Architecture, 

V8- since “catastrophic” errors are now handled using normal error-reporting 

Cs10 mechanisms. 

32- t Restartable deferred traps 463 
V8- Whether any restartable deferred traps (and associated deferred-trap queues) are 

Ms10 present is implementation dependent. 

33- f Trap precision 467 
V8- In an UltraSPARC Architecture implementation, all exceptions that occur as the result 
Cs10 of program execution, except for store error, are precise. 

34-V8 f Interrupt clearing 


a: The method by which an interrupt is removed is now defined in the UltraSPARC 513 
Architecture (see Clearing the Software Interrupt Register on page 513). 

b: How quickly a virtual processor responds to an interrupt request, like all timing- 
related issues, is implementation dependent. 


35- t Implementation-dependent traps 473 
V8- Trap type (TT) values 06016-07F16 were reserved for 
Cs20 implementation dependent exception n exceptions in SPARC V9 but are now all 
defined as standard UltraSPARC Architecture exceptions. 
36-V8 f Trap priorities 485 


The relative priorities of traps defined in the UltraSPARC Architecture are fixed. 
However, the absolute priorities of those traps are implementation dependent (because 
a future version of the architecture may define new traps). The priorities (both 
absolute and relative) of any new traps are implementation dependent. 


37-V8 f Reset trap 466 
Some of a virtual processor's behavior during a reset trap is implementation 
dependent. 


38-V8 f Effect of reset trap on implementation-dependent registers 490 
Implementation-dependent registers may or may not be affected by the various reset 
traps. 
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39- f Entering error state on implementation-dependent errors 460, 493 
V8- The virtual processor enters error_state when a trap occurs while the virtual 

Cs10 processor is already at its maximum supported trap level — that is, it enters 


error_state when a trap occurs while TL = MAXTL. No other conditions cause entry 
into error_state on an UltraSPARC Architecture virtual processor. 


40-V8 f error_state virtual processor state 460 
Effects when error_state is entered are implementation dependent, but it is 
recommended that as much virtual processor state as possible be preserved upon 
entry to error_state. In addition, an UltraSPARC Architecture virtual processor 
may have other error_state entry traps that are implementation dependent. 


41-V8 Reserved. 
42- t, f, v FLUSH instruction 


V8- FLUSH is implemented in hardware in all UltraSPARC Architecture 2005 
Cs10 implementations, so never causes a trap as an unimplemented instruction. 
43-V8 Reserved. 
44- f Data access FPU trap 
V8- a: Ifa load floating-point instruction generates an exception that causes a non-precise 252, 274 
Cs10 trap, it is implementation dependent whether the contents of the destination 
floating-point register(s) or floating-point state register are undefined or are 256 


guaranteed to remain unchanged. 

b: If a load floating-point alternate instruction generates an exception that causes a 
non-precise trap, it is implementation dependent whether the contents of the 
destination floating-point register(s) are undefined or are guaranteed to remain 


unchanged. 
45-V8-46-V8 Reserved. 
47- t RDasr 304 
V8- RDasr instructions with rd in the range 28-31 are available for implementation- 
Cs20 dependent uses (impl. dep. #8-V8-Cs20). For an RDasr instruction with rs1 in the 


range 28-31, the following are implementation dependent: 

* the interpretation of bits 13:0 and 29:25 in the instruction 

* whether the instruction is nonprivileged or privileged or hyperprivileged (impl. 
dep. #9-V8-Cs20) 

* whether an attempt to execute the instruction causes an illegal instruction exception 


48- t WRasr 377 
V8- WRasr instructions with rd in the range 26-31 are available for implementation- 
Cs20 dependent uses (impl. dep. #8-V8-Cs20). For a WRasr instruction with rd in the range 


26-31, the following are implementation dependent: 

* the interpretation of bits 18:0 in the instruction 

* the operation(s) performed (for example, xor) to generate the value written to the 
ASR 

* whether the instruction is nonprivileged or privileged or hyperprivileged (impl. 
dep. #9-V8-Cs20) 

* whether an attempt to execute the instruction causes an illegal instruction exception 


49-V8-54-V8 Reserved. 
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55- f 
V8- 
Cs10 


56-100 


101- 
V9- 
CS10 


Ms10 


Tininess detection 

In SPARC V9, it is implementation-dependent whether “tininess” (an IEEE 754 term) is 
detected before or after rounding. In all UltraSPARC Architecture implementations, 
tininess is detected before rounding. 


Reserved. 


Maximum trap level (MAXPTL, MAXTL) 

The architectural parameter MAXPTL is a constant for each implementation; its legal 
values are from 2 to 6 (supporting from 2 to 6 levels of saved trap state visible to 
privileged software). In a typical implementation MAXPTL = MAXPGL (see impl. dep. 
#401-S10). 

The architectural parameter MAXTL is a constant for each implementation; its legal 
values are from 3 to 7 (supporting from 3 to 7 levels of saved trap state). 
Architecturally, MAXPTL must be 2 2, MAXTL must be > 4, and MAXTL must be > MAXPTL. 


Clean windows trap 

An implementation may choose either to implement automatic “cleaning” of register 
windows in hardware or to generate a clean_window trap, when needed, for 
window(s) to be cleaned by software. 


Prefetch instructions 

The following aspects of the PREFETCH and PREFETCHA instructions are 

implementation dependent: 

a: the attributes of the block of memory prefetched: its size (minimum = 64 bytes) 
and its alignment (minimum = 64-byte alignment) 

b: whether each defined prefetch variant is implemented (1) as a NOP, (2) with its 
full semantics, or (3) with common-case prefetching semantics 

c: whether and how variants 16, 18, 19 and 24-31 are implemented; if not 
implemented, a variant must execute as a NOP 


The following aspects of the PREFETCH and PREFETCHA instructions used to be (but 

are no longer) implementation dependent: 

d: while in nonprivileged mode (PSTATE.priv = 0 and HPSTATE.hpriv = 0), an attempt 
to reference an ASI in the range 046..7F46 by a PREFETCHA instruction executes 
as a NOP; specifically, it does not cause a privileged_action exception. 

e: PREFETCH and PREFETCHA have no observable effect in privileged code 

f: In UltraSPARC Architecture 2005, neither PREFETCH nor PREFETCHA can cause a 
data access MMU miss exception (because a Strong prefetch is treated the same 
as a Weak prefetch) 

g: while in privileged mode (PSTATE.priv = 1 and HPSTATE.hpriv = 0), an attempt to 
reference an ASI in the range 3046..7F16 by a PREFETCHA instruction executes as a 
NOP (specifically, it does not cause a privileged action exception) 


Page 
69 


100, 102 


498 


296 


296, 299 


301C 
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104 a HVER.manuf 109 
V9 HVER.manuf contains a 16-bit semiconductor manufacturer code. This field is optional 


and, if not present, reads as zero. VER.manuf may indicate the original supplier of a 
second-sourced processor in cases involving mask-level second-sourcing. It is 
intended that the contents of HVER.manuf track the JEDEC semiconductor 
manufacturer code as closely as possible. If the manufacturer does not have a JEDEC 
semiconductor manufacturer code, then SPARC International will assign a 


HVER.manuf value. 
105- f TICK register 76 
v9 a: If an accurate count cannot always be returned when TICK is read, any inaccuracy 


should be small, bounded, and documented. 

b: An implementation may implement fewer than 63 bits in TICK.counter; however, 
the counter as implemented must be able to count for at least 10 years without 
overflowing. Any upper bits not implemented must read as 0. 


106- f IMPDEP2A instructions 238 
V9 The IMPDEP2A instructions are completely implementation dependent. 
Implementation-dependent aspects include their operation, the interpretation of bits 
29:25 and 18:0 in their encodings, and which (if any) exceptions they may cause. 


107- f Unimplemented LDTW(A) trap 

v9 a: It is implementation dependent whether LDTW is implemented in hardware. If 265 
not, an attempt to execute an LDTW instruction will cause an 
unimplemented LDTW exception. 268 


b: It is implementation dependent whether LDTWA is implemented in hardware. If 
not, an attempt to execute an LDTWA instruction will cause an 
unimplemented LDTW exception. 


108- f Unimplemented STTW(A) trap 

V9 a: Itis implementation dependent whether STTW is implemented in hardware. If not, 352 
an attempt to execute an STTW instruction will cause an unimplemented STTW 
exception. 355 


b: It is implementation dependent whether STDA is implemented in hardware. If not, 
an attempt to execute an STTWA instruction will cause an unimplemented STTW 
exception. 
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109- 
V9- 
Cs10 


110- 
V9- 
Cs10 


Category Description 


f 


f 


LDDF(A) mem address not aligned 
a: LDDF requires only word alignment. However, if the effective address is word- 





aligned but not doubleword-aligned, an attempt to execute a valid (i = 1 or 
instruction bits 12:5 = 0) LDDF instruction may cause an 

LDDF mem adaress not aligned exception. In this case, the trap handler software 
shall emulate the LDDF instruction and return. 

(In an UltraSPARC Architecture processor, the LDDF mem address not aligned 
exception occurs in this case and trap handler software emulates the LDDF 
instruction) 








: LDDFA requires only word alignment. However, if the effective address is word- 


aligned but not doubleword-aligned, an attempt to execute a valid (i = 1 or 
instruction bits 12:5 = 0) LDDFA instruction may cause an 

LDDF mem adaress not aligned exception. In this case, the trap handler software 
shall emulate the LDDFA instruction and return. 

(In an UltraSPARC Architecture processor, the LDDF mem address not aligned 
exception occurs in this case and trap handler software emulates the LDDFA 
instruction) 








STDF(A) mem address not aligned 
a: STDF requires only word alignment in memory. However, if the effective address is 





word-aligned but not doubleword-aligned, an attempt to execute a valid (i= 1 or 
instruction bits 12:5 = 0) STDF instruction may cause an 

STDF mem adaress not aligned exception. In this case, the trap handler software 
must emulate the STDF instruction and return. 

(In an UltraSPARC Architecture processor, the STDF mem address not aligned 
exception occurs in this case and trap handler software emulates the STDF 
instruction) 








: STDFA requires only word alignment in memory. However, if the effective address 


is word-aligned but not doubleword-aligned, an attempt to execute a valid (i = 1 or 
instruction bits 12:5 = 0) STDFA instruction may cause an 

STDF mem adaress not aligned exception. In this case, the trap handler software 
must emulate the STDFA instruction and return. 

(In an UltraSPARC Architecture processor, the STDF mem address not aligned 
exception occurs in this case and trap handler software emulates the STDFA 
instruction) 








Page 


116, 116, 
252, 503 


255 


116, 
339, 504 


342 
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111- 
V9- 
Cs10 


112- 
V9- 
Cs10 


Category Description 


f 


f 


LDQF(A) mem address not aligned 
a: LDQF requires only word alignment. However, if the effective address is word- 





aligned but not quadword-aligned, an attempt to execute an LDOF instruction may 
cause an LDQF mem adaress not aligned exception. In this case, the trap handler 
software must emulate the LDQF instruction and return. 

(In an UltraSPARC Architecture processor, the LDQF mem adaress not aligned 
exception occurs in this case and trap handler software emulates the LDOF 
instruction) 

(this exception does not occur in hardware on UltraSPARC Architecture 2005 
implementations, because they do not implement the LDOF instruction in 
hardware) 








: LDQFA requires only word alignment. However, if the effective address is word- 


aligned but not quadword-aligned, an attempt to execute an LDOFA instruction 
may cause an LDQF mem adaress not aligned exception. In this case, the trap 
handler software must emulate the LDOF instruction and return. 

(In an UltraSPARC Architecture processor, the LDQF mem adaress not aligned 
exception occurs in this case and trap handler software emulates the LDOFA 
instruction) 

(this exception does not occur in hardware on UltraSPARC Architecture 2005 
implementations, because they do not implement the LDQFA instruction in 
hardware) 








STQF(A) mem address not aligned 
a: STQF requires only word alignment in memory. However, if the effective address is 117, 





word aligned but not quadword aligned, an attempt to execute an STOF instruction 
may cause an STQF mem adaress not aligned exception. In this case, the trap 
handler software must emulate the STQF instruction and return. 

(In an UltraSPARC Architecture processor, the STQF mem adaress not aligned 
exception occurs in this case and trap handler software emulates the STOF 
instruction) 

(this exception does not occur in hardware on UltraSPARC Architecture 2005 
implementations, because they do not implement the STOF instruction in 
hardware) 








: STOFA requires only word alignment in memory. However, if the effective address 


is word aligned but not quadword aligned, an attempt to execute an STOFA 
instruction may cause an STQF mem adaress not aligned exception. In this case, 
the trap handler software must emulate the STOFA instruction and return. 

(In an UltraSPARC Architecture processor, the STQF mem adaress not aligned 
exception occurs in this case and trap handler software emulates the STOFA 
instruction) 

(this exception does not occur in hardware on UltraSPARC Architecture 2005 
implementations, because they do not implement the STOFA instruction in 
hardware) 








Page 


117, 116, 
252, 506 


255 


340, 506 


342 
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113- f Implemented memory models 96, 410 
V9- Whether memory models represented by PSTATE.mm = 10, or 11, are supported in an 

Ms10 UltraSPARC Architecture processor is implementation dependent. If the 10, model is 


supported, then when PSTATE.mm = 10, the implementation must correctly execute 
software that adheres to the RMO model described in The SPARC Architecture Manual- 
Version 9. If the 115 model is supported, its definition is implementation dependent. 


114- f RED state trap vector address (RSTVADDR) 471, 567 
V9- The RED. state trap vector is located at an address referred to as HSTVADDR. In thr 
Cs10 UltraSPARC Architecture, HSTVADDR is bound to the following address: 

Physical Address FFFF FFFF F000 000016 


(the highest 256MB of physical address space) 


In an implementation that implements fewer than 64 bits of physical addressing, 
unimplemented high-order bits of the above RSTVADDR are ignored. 


115- f RED state 459 
v9 What occurs after the processor enters RED_state is implementation dependent. 

116- f SIR_enable control flag 326 
v9 SPARC V9 states that the location of the SIR_enable control flag and the means by 


which it is accessed are implementation dependent. In UltraSPARC Architecture 
virtual processors, the SIR_enable control flag does not explicitly exist; the SIR 
instruction always generated an illegal_ instruction exception in nonprivileged and 
privileged modes. SIR only causes a software initiated reset trap when executed in 
hyperprivleged mode. 


118- f Identifying I/O locations 402 

V9 The manner in which I/O locations are identified is implementation dependent. 

119- f Unimplemented values for PSTATE.mm 97, 411 
Ms10 The effect of an attempt to write an unsupported memory model designation into 


PSTATE.mm is implementation dependent; however, it should never result in a value 
of PSTATE.mm value greater than the one that was written. In the case of an 
UItraSPARC Architecture implementation that only supports the TSO memory model, 
PSTATE.mm always reads as zero and attempts to write to it are ignored. 


120- f Coherence and atomicity of memory operations 402 

V9 The coherence and atomicity of memory operations between virtual processors and 
I/O DMA memory accesses are implementation dependent. 

121- f Implementation-dependent memory model 402 

V9 An implementation may choose to identify certain addresses and use an 
implementation-dependent memory model for references to them. 

122- f FLUSH latency 188, 419 

v9 The latency between the execution of FLUSH on one virtual processor and the point at 


which the modified instructions have replaced outdated instructions in a 
multiprocessor is implementation dependent. 


123- f Input/output (I/O) semantics 29 
v9 The semantic effect of accessing I/O registers is implementation dependent. 





APPENDIX B * Implementation Dependencies 595 


TABLE B-1 SPARC V9 Implementation Dependencies (10 of 10) 





Nbr Category Description Page 
124- v Implicit ASI when TL » 0 405 
V9 In SPARC V9, when TL > 0, the implicit ASI for instruction fetches, loads, and stores is 


implementation dependent. In all UltraSPARC Architecture implementations, when 
TL > 0, the implicit ASI for instruction fetches is ASI. NUCLEUS; loads and stores will 
use ASI, NUCLEUS if PSTATE.cle = 0 or ASI, NUCLEUS, LITTLE if PSTATE.cle = 1. 


125- f Address masking 98, 98, 
V9- (1) When PSTATE.am = 1, only the less-significant 32 bits of the PC register are stored 164,241, 
Cs10 in the specified destination register(s) in CALL, JMPL, and RDPC instructions, while 304, 488 


the more-significant 32 bits of the destination registers(s) are set to 0. 

((2) When PSTATE.am = 1, during a trap, only the less-significant 32 bits of the PC and 
NPC are stored (respectively) to TPC[TL] and TNPC[TL]; the more-significant 32 bits 
of TPC[TL] and TNPC[TL] are set to 0. 


126- Register Windows State registers width 86 
V9- Privileged registers CWP, CANSAVE, CANRESTORE, OTHERWIN, and CLEANWIN 
Ms10 contain values in the range 0 to N REG WINDOWS — 1. An attempt to write a value 


greater than N REG WINDOWS — 1 to any of these registers causes an implementation- 
dependent value between 0 and N REG WINDOWS — 1 (inclusive) to be written to the 
register. Furthermore, an attempt to write a value greater than N REG WINDOWS — 2 
violates the register window state definition in Register Window Management 
Instructions on page 131. 

Although the width of each of these five registers is architecturally 5 bits, the width is 
implementation dependent and shall be between [log.(N REG WINDOWS) and 5 bits, 
inclusive. If fewer than 5 bits are implemented, the unimplemented upper bits shall 
read as 0 and writes to them shall have no effect. All five registers should have the 
same width. 

For UltraSPARC Architecture 2005 processors, N REG WINDOWS = 8. Therefore, each 
register window state register is implemented with 3 bits, the maximum value for 
CWP and CLEANWIN is 7, and the maximum value for CANSAVE, CANRESTORE, 
and OTHERWIN is 6. When these registers are written by the WRPR instruction, bits 
63:3 of the data written are ignored. 


127-199 Reserved. — 


TABLE B-2 provides a list of implementation dependencies that, in addition to those 
in TABLE B-1, apply to UltraSPARC Architecture processors. Bold face indicates the 
main page on which the implementation dependency is described. See Appendix C 
in the Extensions Documents for further information. 
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200-201 Reserved. — 
202-U3 fast ECC error trap 506 


Whether or not a fast ECC error trap exists is implementation dependent. If it does exist, 
it indicates that an ECC error was detected in an external cache and its trap type is 07016- 


203-U3- Dispatch Control register (DCR) bits 13:6 and 1 
Cs10 This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 
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Nbr 


204-U3- 
CS10 


205-U3- 
Cs10 


206-U3- 


Cs10 


207-U3 


208-U3 


209-U3 


210-U3 


211-U3 


212-U3- 
Cs10 


213-U3 


214-U3 


215-U3 


Description Page 


DCR bits 5:3 and 0 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 


Instruction Trap Register 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 


SHUTDOWN instruction 324 
On an UItraSPARC Architecture implementation executing in privileged or 
hyperprivileged mode, SHUTDOWN behaves like a NOP. 


PCR register bits 47:32, 26:17, and 3 78 
The values and semantics of bits 47:32, 26:17, and bit 3 of the PCR register are 
implementation dependent. 


Ordering of errors captured in instruction execution — 
The order in which errors are captured in instruction execution is implementation 
dependent. Ordering may be in program order or in order of detection. 


Software intervention after instruction-induced error — 
Precision of the trap to signal an instruction-induced error of which recovery requires 
software intervention is implementation dependent. 


ERROR output signal — 

The following aspects of the ERROR output signal are implementation dependent in the 

UltraSPARC Architecture: 

* The causes of the ERROR signal 

* Whether each of the causes of the ERROR signal, when it generates the ERROR signal, 
halts the virtual processor or allows the virtual processor to continue running 

* The exact semantics of the ERROR signal 


Error logging registers' information = 
The information that the error logging registers preserves beyond the reset induced by an 
ERROR signal is implementation dependent. 


Trap with fatal error — 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 


AFSR . priv — 
The existence of the AFSR . priv bit is implementation dependent. If AFSR. priv is 
implemented, it is implementation dependent whether the logged AFSR . priv indicates the 
privileged state upon the detection of an error or upon the execution of an instruction that 
induces the error. For the former implementation to be effective, operating software must 
provide error barriers appropriately. 


Enable/disable control for deferred traps — 
Whether an implementation provides an enable/disable control feature for deferred traps 
is implementation dependent. 


Error barrier = 
DONE and RETRY instructions may implicitly provide an error barrier function as 
MEMBAR #Sync. Whether DONE and RETRY instructions provide an error barrier is 
implementation dependent. 
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Nbr 
216-U3 


217-U3 


218-U3- 
Cs20 


219-U3 


220-U3 


221-U3 


223-U3 


225-U3 


228-U3- 
Cs10 


229-U3- 
Cs10 


230 
230-U3 


232-U3- 
Cs10 


233-U3- 
Cs10 


Description Page 


data_access_error trap precision — 
The precision of a data access error trap is implementation dependent. 


instruction access error trap precision — 
The precision of an instruction access error trap is implementation dependent. 


async data error — 
Whether async data error exception is implemented is implementation dependent. If it 
does exist, it indicates that an error is detected in a processor core and its trap type is 4016. 


Asynchronous Fault Address register (AFAR) allocation — 
Allocation of Asynchronous Fault Address register (AFAR) is implementation dependent. 
There may be one instance or multiple instances of AFAR. Although the ASI for AFAR is 
defined as 4D46, the virtual address of AFAR if there are multiple AFARs is implementation 
dependent. 


Addition of logging and control registers for error handling — 
Whether the implementation supports additional logging and control registers for error 
handling is implementation dependent. 


Special/signalling ECCs — 
The method to generate "special" or "signalling" ECCs and whether a processor ID is 
embedded into the data associated with special/signalling ECCs is implementation 
dependent. 


TLB multiple-hit detection — 
Whether TLB multiple-hit detection is supported in an UltraSPARC Architecture 
implementation is implementation dependent. 


TLB locking of entries 
The mechanism by which entries in TLB are locked is implementation dependent in 
UltraSPARC Architecture implementations. 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005.TSB — 
Base address generation 

Whether the implementation generates the TSB Base address by exclusive-ORing the TSB 
Base register and a TSB register or by taking the tsb_base field directly from a TSB register 

is implementation dependent in UltraSPARC Architecture. This implementation 

dependency existed for UltraSPARC III/IV, only to maintain compatibility with the TLB 
miss handling software of UltraSPARC I/II. 


Reserved. — 


data_access_exception trap — 
The causes of a data access exception trap are implementation dependent in UltraSPARC 
Architecture 2005. 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 
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Nbr 


235-U3- 
Cs10 


236-U3- 
Cs10 


237-U3 


239-U3- 
Cs10 


240-U3- 
Cs10 


241-U3 


243-U3 


244-U3- 
Cs10 


245-U3- 
Cs10 
246-U3 


247-U3 


248-U3 


249-U3- 
Cs10 


Description Page 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005.t = 


JMPL/RETURN mem_address_not_aligned — 
Whether the fault status and/or address (D-SFSR/D-SFAR) are is captured when a 
mem_address_not_aligned trap occurs during a JMPL or RETURN instruction is 
implementation dependent. 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 
Reserved. — 


Address Masking and D-SFAR 98 
When PSTATE.am = 1 and an exception occurs, the value written to the more-significant 32 
bits of the Data Synchronous Fault Address Register (D-SFAR) is implementation 
dependent. 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 


Data Watchpoint Reliability — 
Data Watchpoint traps are completely implementation-dependent in UltraSPARC 
Architecture processors. 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 


Conditions for fp_exception_other with unfinished_FPop 65 
The conditions under which an fp_exception_other exception with floating-point trap type 

of unfinished_FPop can occur are implementation dependent. An implementation may 
cause fp exception other with unfinished_FPop under a different (but specified) set of 
conditions. 


Data Watchpoint for Partial Store Instruction 349 
For an STPARTIAL instruction, the following aspects of data watchpoints are 

implementation dependent: (a) whether data watchpoint logic examines the byte store 

mask in R[rs2] or it conservatively behaves as if every Partial Store always stores all 8 

bytes, and (b) whether data watchpoint logic examines individual bits in the Virtual 
(Physical) Data Watchpoint Mask in the LSU Control register to determine which bytes are 
being watched or (when the Watchpoint Mask is nonzero) it conservatively behaves as if 

all 8 bytes are being watched. 
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Nbr 


250-U3- 
Cs10 


251 


252-U3- 
Cs10 


253-U3- 
Cs10 


254-U3- 
Cs10 


257-U3 


258-U3- 
Cs10 


259-299 


300-U4- 
Cs10 


301-U4- 
Cs10 


302-U4- 
Cs10 


303-U4- 
CS10 


304-U4- 
Cs10 


305-U4- 
Cs10 


Description Page 


PCR accessibility when PSTATE.priv = 0 78, 305, 
In an UltraSPARC Architecture implementation, PCR is never accessible to nonprivileged 378 
software. Specifically, when a virtual processor is operating in nonprivileged mode 

(PSTATE.priv = 0 and HPSTATE.HPRIV = 0), an attempt to access PCR (using an RDPCR or 

a WRPCR instruction) results in a privileged opcode exception. 


Reserved. 


Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 


Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 


Means of exiting error state 460, 466, 
A virtual processor, upon entering error state, automatically generates a 493, 505, 
watchdog reset (WDR). 566 
LDDFA with ASI C046-C5,6 or C816-CD16 and misaligned memory address 256 


If an LDDFA opcode is used with an ASI of C016-C546 or C846-CD4g (Partial Store ASIs, 
which are an illegal combination with LDDFA) and a memory address is specified with 
less than 8-byte alignment, the virtual processor generates n exception. It is 
implementation dependent whether the exception generated is data access exception, 
mem adaress not aligned, or LDDF mem adaress not aligned. 





ASI SERIAL ID = 
(This register is not defined in the UltraSPARC Architecture, so this implementation 
dependency does not apply to UltraSPARC Architecture 2005.) 


Reserved. — 


Attempted access to ASI registers with LDTWA 269 
If an LDTWA instruction referencing a non-memory ASI is executed, it generates a 
dala access exception exception. 


Attempted access to ASI registers with STTWA 355 
If an STTWA instruction referencing a non-memory ASI is executed, it generates a 
dala access exception exception. 


Scratchpad registers 445 
An UltraSPARC Architecture processor includes eight privileged Scratchpad registers (64 
bits each, read/write accessible). 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 


XIR 565 


XIR affects only the virtual processors identified in the XIR_STEERING register (not a 
whole system). 


Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. m 
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Nbr 


306-U4- 
Cs10 


307-U4- 
Cs10 


308-U3- 
Cs10 


309-U4- 
Cs10 


311-319 


321-U4 
322-U4 


323-U4 


324-U4 


325-U4 


326-U4- 
Cs10 


327-399 


Description 


Page 


Trap type generated upon attempted access to noncacheable page with LDTXA 271 
When an LDTXA instruction attempts access from an address that is not mapped to 
cacheable memory space, a dala access exception exception is generated. 


Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 


Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 


Reserved. 


Reserved. 


Strand Interrupt ID register 


539 


Whether any portion of the int_id field of the Strand Interrupt ID register is read-only is 


implementation dependent. 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 


Power used by CMT 


541 


Whether disabling a virtual processor reduces the power used by a CMT processor is 


implementation dependent. 


Updating Strand Enable Register 


543 


Whether an implementation provides a restriction that prevents software from writing a 
value of all zeroes (or zeroes corresponding to all available virtual processors) to the 
STRAND_ENABLE register is implementation dependent. This restriction avoids the 
dangerous case where all virtual processors become disabled and the only way to enable 
any virtual processor is a hard power_on_reset (a warm reset would not suffice). If such a 
restriction is implemented and software running on any virtual processor attempts to write 
a value of all zeroes (or zeroes corresponding to all available virtual processors) to the 
STRAND_ENABLE register, hardware forces the STRAND_ENABLE register to an 
implementation-dependent value which enables at least one of the available virtual 


processors. 


Parking a virtual processor 


545 


Whether parking a virtual processor reduces the power used by a CMT processor is 


implementation dependent. 


XIR Steering register (XIR Reset) 


a: Whether XIR_STEERING{n} is a read-only bit or a read/write bit is implementation 552 
dependent. If XIR_STEERING{n} is read-only, then (1) writes to XIR_STEERING{n} are 

ignored and (2) XIR STEERING(n] is set to 1 if virtual processor n is available and to 0 if 

it is not available (that is, XIR_STEERING{n} reads the same as STRAND AVAILABLE(1]. 

b: If XIR_STEERING{n} is read/write, upon de-assertion of reset the value of 553, 570 
STRAND_AVAILABLE{n} is copied to XIR_STEERING{n} for all UltraSPARC Architecture 


implementations. 


Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 


Reserved 
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Nbr 
400-S10 


401-S10 


402-S10 


403-S10 


404-S10 


405-S10 


406-S10 


407-S10 


408-S10 


Description Page 


Global Level register (GL) implementation 103 
Although GL is defined as a 4-bit register, an implementation may implement any subset 

of those bits sufficient to encode the values from 0 to MAXGL for that implementation. If 

any bits of GL are not implemented, they read as zero and writes to them are ignored. 


Maximum Global Level (MAXPGL, MAXGL) 100, 102 
The architectural parameter MAXPGL is a constant for each implementation; its legal values 

are from 2 to 15 (supporting from 3 to 16 sets of global registers visible to privileged 

software). In a typical implementation MAXPGL = MAXPTL (see impl. dep. #101-V9-CS10). 

The architectural parameter MAXGL is a constant for each implementation; its legal values 

are from 4 to 15 (supporting from 5 to 16 sets of global registers). 

Architecturally, MAXPTL must be = 2 and MAXGL must be > MAXPGL. 


Priority of internal processor error 480, 484, 
The trap priority of the internal processor error exception is implementation dependent. 502 
Furthermore, its priority may vary within an implementation, based on the cause of the 

error being reported. 


Setting of "dirty" bits in FPRS 77,77 
A “dirty” bit (du or dl) in the FPRS register must be set to ‘1’ if any of its corresponding F 
registers is actually modified. The specific conditions under which a dirty bit is set are 
implementation dependent. 


Privileged Scratchpad registers 4 through 7 445 
The degree to which Scratchpad registers 4—7 are accessible to privileged software is 
implementation dependent. Each may be (1) fully accessible, (2) accessible, with access 

much slower than to scratchpad register 0-3(emulated by trap to hyperprivileged 

software), or (3) inaccessible (cause a data access exception exception). 


Virtual address range 28 
An UltraSPARC Architecture implementation may support a full 64-bit virtual address 
space or a more limited range of virtual addresses. In an implementation that does not 
support a full 64-bit virtual address space, the supported range of virtual addresses is 
restricted to two equal-sized ranges at the extreme upper and lower ends of 64-bit 
addresses; that is, for n-bit virtual addresses, the valid address ranges are 0 to 2-1 - 1 and 
264 — 2-1 to 204 — 1, 


HTBA high-order bits 108 
It is implementation dependent whether all 50 bits of HTBA{63:14} are implemented or if 
only bits n-1:0 are implemented. If the latter, writes to bits 63:n are ignored and when 

HTBA is read, bits 63:n read as sign-extended copies of the most significant implemented 

bit, HTBA{n — 1}. 


Hyperprivileged Scratchpad register aliasing 446 
It is implementation dependent whether any of the hyperprivileged Scratchpad registers 

are aliased to the corresponding privileged Scratchpad register or is an independent 

register. 


HPSTATE bit 11 105 
The contents and semantics of HPSTATE{11} are implementation dependent. 
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Nbr Description Page 
409-S10- FLUSH instruction and memory consistency 189, 191 
Cs20 The implementation of the FLUSH instruction is implementation dependent. 

If the implementation automatically maintains consistency between instruction and data 

memory, 


(1) the FLUSH address is ignored and 

(2) the FLUSH instruction cannot cause any data access exceptions, because its effective 
address operand is not translated or used by the MMU. 

On the other hand, if the implementation does not maintain consistency between 

instruction and data memory, the FLUSH address is used to access the MMU and the 

FLUSH instruction can cause data access exceptions. 


410-810 Block Load behavior 

The following aspects of the behavior of block load (LDBLOCKF) instructions are 248 

implementation dependent: 

+ What memory ordering model is used by LDBLOCKF (LDBLOCKF is not required to 
follow TSO memory ordering) 

* Whether LDBLOCKF follows memory ordering with respect to stores (including block 
stores), including whether the virtual processor detects read-after-write and write-after- 
read hazards to overlapping addresses 

* Whether LDBLOCKF appears to execute out of order, or follow LoadLoad ordering 
(with respect to older loads, younger loads, and other LDBLOCKFs) 

+ Whether LDBLOCKF follows register-dependency interlocks, as do ordinary load 
instructions 

* Whether LDBLOCKFs to non-cacheable locations are 
(a) strictly ordered, 

(b) not strictly ordered and cause an data access exception exception, or 
(c) not strictly ordered and silently execute without causing an exception (option (c) is 
strongly discouraged) 


* Whether the MMU ignores the side-effect bit (TTE.e) for LDBLOCKF accesses 402 
(in which case, LDBLOCKFs behave as if TTE.e = 0) 


* Whether VA watchpoint exceptions are recognized on accesses to all 64 bytes of a 249, 249 
LDBLOCKF (the recommended behavior), or only on accesses to the first eight bytes 
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Nbr Description Page 


411-810 Block Store behavior 337, 337 
The following aspects of the behavior of block store (STBLOCKF) instructions are 
implementation dependent: 

+ The memory ordering model that STBLOCKF follows (other than as constrained by the 
rules outlined on page 337). 

* Whether VA watchpoint exceptions are recognized on accesses to all 64 bytes of a 
STBLOCKF (the recommended behavior), or only on accesses to the first eight bytes. 

+ Whether STBLOCKFs to non-cacheable (TTE.cp = 0) pages execute in strict program 
order or not. If not, a STBLOCKF to a non-cacheable page causes a 
data access exception exception. 

+ Whether STBLOCKF follows register dependency interlocks (as ordinary stores do). 

* Whether a non-Commit STBLOCKT forces the data to be written to memory and 
invalidates copies in all caches present (as the Commit variants of STBLOCKF do). 


* Whether the MMU ignores the side-effect bit (TTE.e) for STBLOCKF accesses 402 
(in which case, STBLOCKFs behave as if TTE.e = 0) 


* Any other restrictions on the behavior of STBLOCKF, as described in implementation- 
specific documentation. 


412-810  MEMBAR behavior 277 
An UltraSPARC Architecture implementation may define the operation of each MEMBAR 
variant in any manner that provides the required semantics. 


413-810 Load Twin Extended Word behavior 271 
It is implementation dependent whether VA watchpoint exceptions are recognized on 
accesses to all 16 bytes of a LDTXA instruction (the recommended behavior) or only on 
accesses to the first 8 bytes. 


414 Reserved. — 


417-S10 Behavior of DONE and RETRY when TSTATE[TL].pstate.am = 1 99, 169313 
If (1) TSTATE[TL].pstate.am = 1 and (2) a DONE or RETRY instruction is executed (which 
sets PSTATE.am to '1' by restoring the value from TSTATE[TL].pstate.am to PSTATE.am), 
it is implementation dependent whether the DONE or RETRY instruction masks (zeroes) 
the more-significant 32 bits of the values it places into PC and NPC. 








418 unsused T 


419-810 Contents of TPC[TL], TNPCITL], TSTATE[TL], and HTSTATE[TL] after a Warm Reset 569 
(WMR) 
It is implementation dependent whether, after a Warm Reset (WMR), the contents of 
TPC[TL], TNPC[TL], TSTATE[TL], and HTSTATE[TL] are unchanged from their values 
before the WMR, or are contain the same values saved as during a WDR, XIR, or SIR reset. 
(The latter implementation is the preferred one.) 
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Nbr 
420-S10 


421-S10 


422-S10 


423-S10 


424-S10 


425-S10 


426-S10 


427-S10 


428-S10 


Description Page 


Implementation Dependent Aspects of a Warm Reset (WMR) 565 
The following aspects of Warm Reset (WMR) are implemenation dependent: 

(a) by what means WMR can be applied (for example, write to reset register or assertion / 
deassertion of an input pin) 

(b) the extent to which a processor is reset by WMR (for example, single physical core, 

entire processor (chip), and how the on-chip memory system is affected), 

(c) by what means hyperprivileged software can distinguish between WMR and POR 

resets 


Interrupt Queue Head and Tail Register Contents 514 
It is implementation dependent whether interrupt queue head and tail registers (a) are 
datatype-agnostic "scratch registers" used for communication between privileged and 
hyperprivileged software, in which case their contents are defined purely by software 
convention, or (b) are maintained to some degree by virtual processor hardware, imposing 

a fixed meaning on their contents. 


Interrupt Queue Tail Register Writability 514, 514 
It is implementation dependent whether tail registers are writable in privileged mode. If 

a tail register is read-only in privileged mode, an attempt to write to it causes a 

data access exception exception. If a tail register is writable in privileged mode, an 

attempt to write to it results in undefined behavior. 


Performance Impact of Disabling a Virtual Processor 541 
Whether disabling a virtual processor increases the performance of other virtual processors 
in the CMT is implementation dependent. 


Ability to Dynamically Enable/Disable a Virtual Processor 544 
Whether a CMT implementation provides the ability to dynamically enable and disable 
virtual processors is implementation dependent. It is tightly coupled to the underlying 
microarchitecture of a specific CMT implementation. This feature is implementation 
dependent because any implementation-independent interface would be too inefficient on 
some implementations. 


TICK Register Counting While a Virtual Processor is Parked 544 
It is implementation dependent whether the TICK register continues to count 
while a virtual processor is parked. 


Performance Impact of Parking a Virtual Processor 545 
The degree to which parking a virtual processor impacts the performance of other virtual 
processors is implementation dependent. 


Latency to Park or Unpark a aVirtual Processor 545, 548 
There may be an arbitrarily long, but bounded, delay (“skid”) from the time when a virtual 
processor is directed to park or unpark (via an update to the STRAND RUNNING register) 

until the corresponding virtual processor(s) actually park or unpark. 


Method by Which Self-Parking is Assured 546 
When a virtual processor writes to the STRAND. RUNNING register to park itself, the 
method by which completion of parking is assured (instructions stop being issued) is 
implementation dependent. 
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Nbr 
429-S10 


430-S10 


431-S10 


432-S10 


433-S10 


434-S10 


435-S10 


436-S10 


437-S10 


438-S10 


Description Page 


Which Virtual Processor is Automatically Unparked 547 
If an update to the STRAND_RUNNING register would cause all enabled virtual 

processors to become parked, it is implementation dependent which virtual processor is 
automatically unparked by hardware. The preferred implementation is that when an 

update to the STRAND_RUNNING register (STXA instruction) would cause all virtual 
processors to become parked, hardware silently ignores (discards) that STXA instruction. 


Parking All But One Virtual Processor in a Multiprocessor Configuration 548 
In a multi‘fio 

configuration, whether all but one virtual processor can be parked is implementation 
dependent. 


Criteria for Completion of Park/Unpark 549 
The criteria used for determining whether a virtual processor is fully parked 

(corresponding bit set to ‘1’ in the STRAND_RUNNING_STATUS register) are 
implementation dependent. 


Standby/Wait state 549 
Whether an implementation implements a Standby (or Wait) state for virtual processors, 
how that state is controlled, and how that state is observed are implementation-dependent. 


Partial-Processor Reset Subsetting Mechanism 551 
A mechanism must exist to specify which subset of virtual processors in a processor 

should be reset when a partial-processor reset (for example, XIR) occurs. The specific 
mechanism is implementation-dependent. 


Error Steering Register(s) 554 
Because of the range of implementation, the number of, organization of, and ASI 
assignments for error steering registers in a CMT processor are implementation dependent. 


Error Steering Register Alternatives 556 
Although the ERROR_STEERING register is the recommended mechanism for steering 
non-virtual-processor-specific errors to a virtual processor for handling, the actual 
mechanism used in a given implementation is implementation dependent. 


Error Steering Register 557 
The width of the target_id field of the ERROR_STEERING register is implementation 
dependent. 


Error Steering Register targetid Field Plurality 557 
An implementation may provide multiple target_id fields in an ERROR_STEERING 
register for different types of non-virtual-processor-specific errors. 


Non-Virtual Processor-Specific Errors in Shared Resources 557 
It is implementation dependent whether the error-reporting structures for errors in shared 
resources appear within a virtual processor in per-virtual-processor registers or are 

contained within shared registers associated with the shared structures in which the errors 
may occur. 
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Nbr 
439-S10 


440-S10 


442-S10 


443-S10 


444-449 


450 
and up 


450-499 


Description Page 


Exception Generated for Each Non-Virtual Processor-Specific Error 557 
The type of exception generated in a virtual processor to handle each type of non-virtual- 
processor-specific error is implementation dependent. A virtual processor can choose to 

use the same exceptions used for corresponding virtual-processor-specific asynchronous 
errors or it can choose to generate different exceptions. 


Which Virtual Processor Unparked During Power-on-Reset (POR) 559 
Which virtual processor is unparked during POR and whether it is unparked by processor 
hardware or by a service processor is implementation dependent. Conventionally, the 

virtual processor with the lowest-numbered strand id is unparked 


STICK register 85 


a: If an accurate count cannot always be returned when STICK is read, any inaccuracy 
should be small, bounded, and documented. 


b: An implementation may implement fewer than 63 bits in STICK.counter; however, the 
counter as implemented must be able to count for at least 10 years without 
overflowing. Any upper bits not implemented must read as 0. 


PSTATE.am for Physical Addresses in Hyperprivileged Mode 99 
In hyperprivileged mode, when PSTATE.am - 1 and physical addressing is being used, it 

is implementation-dependent whether the more-significant 32 bits of addresses are masked 
(treated as zero). 


Reserved for UltraSPARC Architecture 2005 


Reserved for future use 


Reserved for UltraSPARC Architecture 2007 
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APPENDIX C 





Assembly Language Syntax 


This appendix supports Chapter 7, Instructions. Each instruction description in 
Chapter 7 includes a table that describes the suggested assembly language format 
for that instruction. This appendix describes the notation used in those assembly 
language syntax descriptions and lists some synthetic instructions provided by 
UltraSPARC Architecture assemblers for the convenience of assembly language 
programmers. 


The appendix contains these sections: 


m Notation Used on page 609. 
m Syntax Design on page 616. 
m Synthetic Instructions on page 616. 





C.1 Notation Used 


The notations defined here are also used in the assembly language syntax 
descriptions in Chapter 7, Instructions. 


Items in typewriter font are literals to be written exactly as they appear. Items 
in italic font are metasymbols that are to be replaced by numeric or symbolic values 
in actual SPARC V9 assembly language code. For example, “imm_asi” would be 
replaced by a number in the range 0 to 255 (the value of the imm_asi bits in the 
binary instruction) or by a symbol bound to such a number. 


Subscripts on metasymbols further identify the placement of the operand in the 
generated binary instruction. For example, reg, is a reg (register name) whose 
binary value will be placed in the rs2 field of the resulting instruction. 


609 


C4. 


Register Names 


reg. A reg is an integer register name. It can have any of the following values:! 
$r0-$r31 
%g0-%g7 (global registers; same as $r0- $r 7) 
$00—$07 (out registers; same as $r8-$r15) 
%10-%17 (local registers; same as $r16-%r23) 
$10-$i7 (in registers; same as $r24-$r31) 
Sfp (frame pointer; conventionally same as $16) 
Ssp (stack pointer; conventionally same as %06) 


Subscripts identify the placement of the operand in the binary instruction as one of 
the following: 


TES rg1 (rs1 field) 
TES rs? (rs2 field) 
TÉ rq (rd field) 


freg. An freg is a floating-point register name. It may have the following values: 
GEO, SF1, $f2, ... SF31 
$£32,%f34, ..$£60, $£62 (even-numbered only, from $£32 to $£62) 
d0, %d2, $d4,... $d60, $d62 (san, where n mod 2 = 0, only) 
$q0, $q4, $q8, .. $156, $q60  ($qn, where n mod 4 = 0, only) 


o oe 


o 


See Floating-Point Registers on page 55 for a detailed description of how the 
single-precision, double-precision, and quad-precision floating-point registers 
overlap. 


Subscripts further identify the placement of the operand in the binary instruction as 
one of the following: 

freSrs1 (rs1 field) 

fregrso (rs2 field) 

freSrs3 (rs3 field) 

ÎTES rd (rd field) 


asr reg. Anasr_reg is an Ancillary State Register name. It may have one of the 
following values: 
Sasrl6-$asr31 


Subscripts further identify the placement of the operand in the binary instruction as 
one of the following: 


asr_regrs1 (rs1 field) 
asr regyg (rd field) 


L In actual usage, the sp, Sfp, sgn, ton, $1n, and Sin forms are preferred over srn. 
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C12 


i or x cc. Ani or x cc specifies a set of integer condition codes, those based on 
either the 32-bit result of an operation (icc) or on the full 64-bit result (xcc). It may 
have either of the following values: 

$icc 

$xcc 


fccn. An £ccn specifies a set of floating-point condition codes. It can have any of 
the following values: 


$fccO 
$fccl 
$fcc2 
$fcc3 


Special Symbol Names 


Certain special symbols appear in the syntax table in typewriter font. They must be 
written exactly as they are shown, including the leading percent sign (+). 


The symbol names and the registers or operators to which they refer are as follows: 


Sasi Address Space Identifier (ASI) register 
$canrestore Restorable Windows register 

$cansave Savable Windows register 

$ccr Condition Codes register 

$cleanwin Clean Windows register 

Scwp Current Window Pointer (CWP) register 
Sfprs Floating-Point Registers State (FPRS) register 
$fsr Floating-Point State register 

$gsr General Status Register (GSR) 

$hintp Hyperprivileged Interrupt Pending (HINTP) register 
$hpstate Hyperprivileged State (HSTATE) register 


hstick cmpr 





Hyperprivileged System Tick Compare (HSTICK CMPR) 
register 


$htba Hyperprivileged Trap Base Address (HTBA) register 
$htstate Hyperprivileged Trap State (HTSTATE) register 
$hver Hyperprivileged Version (HVER) register 
Sotherwin Other Windows (OTHERWIN) register 

Spc Program Counter (PC) register 

$pcr Performance Control Register (PCR) 

$pic Performance Instrumentation Counters 

$pil Processor Interrupt Level register 

$pstate Processor State register 
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ssoftint 
Ssoftint_clr 
$softint set 
Sstick t 
$stick cmpr t 
$tba 

Stick 
Stick_cmpr 
Stl 


Soft Interrupt register 

Soft Interrupt register (clear selected bits) 

Soft Interrupt register (set selected bits) 

System Timer (STICK) register 

System Timer Compare (STICK_CMPR) register 
Trap Base Address (TBA) register 

Cycle count (TICK) register 

Timer Compare (TICK_CMPR) register 

Trap Level (TL) register 


$tnpc Trap Next Program Counter (TNPC) register 
$tpc Trap Program Counter (TPC) register 
$tstate Trap State (TSTATE) register 

$tt Trap Type (TT) register 

$wstate Window State register 

Sy Y register 


+ The original assembly language names for stick and $stick cmpr were, respectively, $sys tick and 
$sys tick cmpr, which are now deprecated. Over time, assemblers will support the new $stick and 
$stick cmpr names for these registers (which are consistent with stick %tick_cmpr, and 

$hstick cmpr). In the meantime, some existing assemblers may only recognize the original names. 


The following special symbol names are prefix unary operators that perform the 
functions described, on an argument that is a constant, symbol, or expression that 
evaluates to a constant offset from a symbol: 
Shh Extracts bits 63:42 (high 22 bits of upper word) of its operand 
$hm Extracts bits 41:32 (low-order 10 bits of upper word) of its 
operand 
Extracts bits 31:10 (high-order 22 bits of low-order word) of 
its operand 
Slo Extracts bits 9:0 (low-order 10 bits) of its operand 


Shi or $1m 


For example, the value of "$10 (symbol) " is the least-significant 10 bits of symbol. 


Certain predefined value names appear in the syntax table in typewriter font. 
They must be written exactly as they are shown, including the leading sharp sign 
(4). The value names and the constant values to which they are bound are listed in 
TABLE C-1. 


TABLE C-1 Value Names and Values (1 of 2) 


Value Name in Assembly Language Value Comments 





for PREFETCH instruction "fcn" field 


fn reads 0 
fone read 1 
fn writes 2 
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Value Name in Assembly Language Value Comments 

one write 3 

page 4 

unified 17 (1116) 
n_reads_strong 20 (1446) 
one_read_strong 21 (1549 
n_writes_strong 22 (1616) 
one write strong 23 (1716) 











for MEMBAR instruction “mmask” field 


LoadLoad 0116 
StoreLoad 0216 
LoadStore 0416 
for MEMBAR instruction “cmask” field 
StoreStore 0816 
Lookaside 1016 
MemIssue 2016 
Sync 4016 





C.1.3 Values 


Some instructions use operand values as follows: 


const4 
const22 
imm. asi 
siam mode 
simm7 
simm8s 
simm10 
simm11 
simm13 
value 
shcnt32 
shcnt64 


A constant that can be represented in 4 bits 

A constant that can be represented in 22 bits 

An alternate address space identifier (0-255) 

A 3-bit mode value for the SIAM instruction 

A signed immediate constant that can be represented in 7 bits 
A signed immediate constant that can be represented in 8 bits 
A signed immediate constant that can be represented in 10 bits 
A signed immediate constant that can be represented in 11 bits 
A signed immediate constant that can be represented in 13 bits 
Any 64-bit value 

A shift count from 0-31 

A shift count from 0-63 
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C.1.4 Labels 


A label is a sequence of characters that comprises alphabetic letters (a-z, A-Z [with 
upper and lower case distinct]), underscores (_), dollar signs ($), periods (.), and 
decimal digits (0-9). A label may contain decimal digits, but it may not begin with 
one. À local label contains digits only. 


C.1.5 Other Operand Syntax 


Some instructions allow several operand syntaxes, as follows: 


reg_plus_imm Can be any of the following: 


Tegrs1 (equivalent to reg;s; + $90) 
Tegrg, + simm13 

regs; — simm13 

simm13 (equivalent to $g0 + simm13) 
simm13 + reg,ss(equivalent to reg, + simm13) 


address Can be any of the following: 


TES rg1 (equivalent to reg, + %g0) 
regs + simm13 

TS rs1 — simm13 

simm13 (equivalent to $g0 + simm13) 
simm13 + reg;s, (equivalent to reg,,4 + simm13) 


reSrs1 + Te8rs2 


membar_mask Is the following: 


const7 A constant that can be represented in 7 bits. Typically, this is an 
expression involving the logical OR of some combination of 
#Lookaside, #MemIssue, #Sync, #StoreStore, #LoadStore, 
#StoreLoad, and #LoadLoad (see TABLE 7-7 and TABLE 7-8 on 
page 276 for a complete list of mnemonics). 


prefetch_fcn (prefetch function) Can be any of the following: 
0-31 
Predefined constants (the values of which fall in the 0-31 range) useful as 
prefetch_fcn values can be found in TABLE C-1 on page 612. 


regaddr (register-only address) Can be any of the following: 
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TES rg1 (equivalent to reg,s; + $g0) 


re8rs1 + TS rs2 


reg or imm (register or immediate value) Can be either of: 


TES rs2 
simm13 


reg or imm10 (register or immediate value) Can be either of: 


TERrs2 
simm10 


reg_or_imm11 (register or immediate value) Can be either of: 


TES 152 
simm11 


reg_or_shcnt (register or shift count value) Can be any of: 


TCS 152 
shcnt32 


shcnt64 


software trap number Can be any of the following: 


T€ rg1 (equivalent to reg,s; + $g0) 
eSrs1 + TESrs2 

Teg rg, + simms 

Teg rg, — simms 

simm8 (equivalent to 5g0 + simm8) 
simm8 + regrsı (equivalent to reg,s, + simm8) 


The resulting operand value (software trap number) must be in the range 0-255, 
inclusive. 


C.1.6 Comments 


Two types of comments are accepted by the SPARC V9 assembler: C-style "/*...*/ 
" comments, which may span multiple lines, and "! . . ." comments, which extend 
from the "!" to the end of the line. 
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C2 Syntax Design 


The SPARC V9 assembly language syntax is designed so that the following 
statements are true: 


m The destination operand (if any) is consistently specified as the last (rightmost) 
operand in an assembly language instruction. 


m A reference to the contents of a memory location (for example, in a load, store, or 
load-store instruction) is always indicated by square brackets ([]); a reference to 
the address of a memory location (such as in a JMPL, CALL, or SETHI) is specified 
directly, without square brackets. 





C.3 Synthetic Instructions 


TABLE C-2 describes the mapping of a set of synthetic (or “pseudo”) instructions to 
actual instructions. These synthetic instructions are provided by the SPARC V9 
assembler for the convenience of assembly language programmers. 


Note: Synthetic instructions should not be confused with “pseudo ops,” which 
typically provide information to the assembler but do not generate instructions. 
Synthetic instructions always generate instructions; they provide more mnemonic 
syntax for standard SPARC V9 instructions. 


TABLE C2. Mapping Synthetic to SPARC V9 Instructions (1 of 3) 


Synthetic Instruction SPARC V9 Instruction(s) Comment 

cmp Tégrg1, YEQ_or_imm ^ subcc regrg,, reg or imm, %g0 Compare. 

jmp address jmpl address, %g0 

call address jmpl address, %07 

iprefetch label bn,a,pt %xcc,label Originally envisioned as an 


encoding for an "instruction 
prefetch" operation, but 
functions as a NOP on all 
UItraSPARC Architecture 
implementations. ( See 
PREFETCH function 17 on 
page 295 for an alternative 
method of prefetching 
instructions.) 


tst Te rg1 orcc $g0, regrg1, BGO Test. 


ret jmpl $i17-8, %g0 Return from subroutine. 
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TABLE C-2 


Synthetic Instruction 


retl 

restore 

save 

setuw value, reg rg 

set value, regra 
setsw value, reg rg 
setx value, reg, regrd 
signx TeQrsir lESrd 
signx reg rg 

not Vegrs1: T€ rg 


Mapping Synthetic to SPARC V9 Instructions (2 of 3) 


SPARC V9 Instruction(s) 


jmpl 
restore 


Save 


sethi 


sethi 


$07-8, $g0 
$g0, $g0, $g0 
$g0, $g0, $g0 


$hi(value) , regrg 
— or — 

$g0, value, reg, 
— or — 

Shi(value) , regrgi 


reSrar Slo(value) , regra 


Shi(value) , regrg 


—or— 
$g0, value, reg;g 
— or — 


Shi(value) , regrg 


Tegrq; +90, T'egrg 
— or — 
$hi(value) , regrgi 
re rqr Slo(value) , regu 
— or — 
Shi(value) , regrgi 


re rqr $lo(value) , reg, 


Tegrg, $90, regyg 


Shh (value), reg 

reg, shm (value) , reg 
reg, 32,reg 

Shi (value), reg; 


TESrdr Teg, TESrd 
Tegrg, $1o (value), regrd 


Tégrg1, $90, Tegqq 
Tegrg, $90, Tegrq 
Tégrg1, $gO, TESrd 


Comment 
Return from leaf subroutine. 
Trivial RESTORE. 


Trivial SAVE. 
(Warning: trivial SAVE should 
only be used in kernel code!) 


(When ((value&3FF16) == 0).) 


(When 0 < value € 4095). 


(Otherwise) 


Warning: do not use setuw in 
the delay slot of a DCTI. 


synonym for setuw. 


(When (value> = 0) and 
((value & 3FF16) == 0).) 


(When 4096 x value < 4095). 


(Otherwise, if (value « 0) and 
((value & 3FF16) = = 0)) 


(Otherwise, if value 0) 


(Otherwise, if value « 0) 


Warning: do not use setsw in 
the delay slot of a CTI. 


Create 64-bit constant. 


("reg" is used as a temporary 
register.) 


Note: setx optimizations are 
possible but not enumerated 
here. The worst case is shown. 
Warning: do not use setx in the 
delay slot of a CTI. 


Sign-extend 32-bit value to 
64 bits. 


One's complement. 


APPENDIX C * Assembly Language Syntax 617 


TABLE C-2 


Synthetic Instruction 


not 
neg 
neg 
cas 
casl 
casx 


casxl 


inc 





T€ rd 

TeSrg2: "8rd 

TES rd 

[reg rs1], VeSrs2 TS ra 
[regrsil, 'egrs2, TES rd 
[regrsil, 'egrs2, TES rd 
[reg rs1], rers2, regrd 


"eS rd 
const13, regrg 


reS rd 
const13, regrg 


"eS rd 
const13, reg, 


TeS rd 

const13, regrg 
reg or imm, Tegrs1 
reg or imm, Tegra 
reg or imm, Tegra 
reg or imm, Tegra 
reSrd 

[address] 

[address] 

[address] 

[address] 


TERrs1r TS rd 

"eS rd 

reg or imm, Tegra 
SY, TES rd 

Sasrn, legyg 
reg or imm, %y 


reg or imm, $asrn 


Mapping Synthetic to SPARC V9 Instructions (3 of 3) 


SPARC V9 Instruction(s) 


xnor 
sub 
sub 
casa 
casa 
casxa 


casxa 


add 
add 
addcc 
addcc 
sub 
sub 


subcc 


reSrqr $90, regrq 

$90, regrs2r Tera 

5g0, TeSrar TS rd 
[regrg]#ASI_P, reSpg0, Tegra 
[reg,s1l#ASI_P_L, regygo, regyg 
[reg;s1]f ASI P, reg;s2, Tegra 
[regrs11dASI P. L, regrgo, regrg 


TSrar 1, Tera 
regrdr const13, regra 


reSrar 1, leid 
regrdr const13, regra 


TeSrdr 1, Tera 
reSrqr const13, regra 


T€grg, 1, TS rd 
regrar const13, regra 


9 


$g0 
Tegra leg or imm, regyg 
Teg,q, reg or imm, regyg 
Teg,q, leg or imm, regra 
%g0, $g0, regrq 

%g0, [address] 

%g0, [address] 

%g0, [address] 

%g0, [address] 

Tegrs1, 590, Trg 
TeSrdr $90, Trg 

$gO0, reg or imm, regrg 


Tegrg1, leg or imm, 


SY, TES rq 
Sasrn, ler 
$gO0, reg or imm, $y 


$gO0, reg or imm, $asrn 


Comment 

One's complement. 
Two's complement. 
Two's complement. 


Compare and swap. 


Compare and swap, little-endian. 


Compare and swap extended. 


Compare and swap extended, 
little-endian. 


Increment by 1. 

Increment by const13. 
Increment by 1; set icc & xcc. 
Incr by const13; set icc & xcc. 
Decrement by 1. 

Decrement by const13. 
Decrement by 1; set icc & xcc. 
Decr by const13; set icc & xcc. 
Bit test. 

Bit set. 

Bit clear. 

Bit toggle. 

Clear (zero) register. 

Clear byte. 

Clear half-word. 

Clear word. 

Clear extended word. 

Copy and clear upper word. 
Clear upper word. 
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Index 





A 
a (annul) instruction field 
branch instructions, 156, 157, 159, 162, 176, 179 
accesses 
cacheable, 401 
I/O, 401 
restricted ASI, 405 
with side effects, 401, 412 
accrued exception (aexc) field of FSR register, 66, 
468, 588 
ADD instruction, 148 
ADDC instruction, 148 
ADDcc instruction, 148, 328 
ADDCcc instruction, 148 
address 
aliasing, 519 
operand syntax, 614 
separation of virtual and real, 519 
space identifier (ASI), 421 
address mask (am) field of PSTATE register 
description, 97 
address space, 7, 22 
address space identifier (ASI), 7, 400 
accessing MMU registers, 526 
appended to memory address, 27, 114 
architecturally specified, 405 
changed in, 447 
changed in UA 
ASI REAL, 447 
ASI REAL IO, 447 
ASI REAL IO LITTLE, 447 
ASI REAL LITTLE, 447 
ASI TWINX N, 447 


























ASI TWINX NL, 447 
ASI TWINX NUCLEUS, LITTLE, 447 
ASI TWINX R, 447 
ASI TWINX REAL, 447 
ASI TWINX REAL L, 447 
ASI TWINX REAL LITTLE, 447 
definition, 7 
encoding address space information, 115 
explicit, 122 
explicitly specified in instruction, 122 
implicit, See implicit ASIs 
nontranslating, 13, 269, 355 
nontranslating ASI, 422 
operations, 526 
with prefetch instructions, 296 
real ASI, 422 
real-translating ASIs, 422 
restricted, 405, 421 
hyperprivileged, 406 
privileged, 406 
restriction indicator, 74 
SPARC V9 address, 403 
translating ASI, 422 
unrestricted, 406, 421 
virtual-translating ASI, 422 












































address space identifier (ASI) register 


for load /store alternate instructions, 74 

address for explicit ASI, 122 

and LDDA instruction, 254, 267 

and LDSTUBA instruction, 263 

load integer from alternate space 
instructions, 244 

with prefetch instructions, 296 

for register-immediate addressing, 406 


restoring saved state, 168, 313 

saving state, 453 

and STDA instruction, 354 

store floating-point into alternate space 
instructions, 341 


store integer to alternate space instructions, 332 


and SWAPA instruction, 361 
after trap, 33 
and TSTATE register, 93 
and write state register instructions, 377 
addressing modes, 22 
ADDX instruction (SPARC V8), 148 
ADDXcc instruction (SPARC V8), 148 
AFAR, See Asynchronous Fault Address register 
(AFAR) 
AFSR, See Asynchronous Fault Status register 
(AFSR) 
alias 
floating-point registers, 55 
aliased, 7 
ALIGNADDRESS instruction, 149 
ALIGNADDRESS LITTLE instruction, 149 
alignment 
data (load /store), 28, 116, 403 
doubleword, 28, 116, 403 
extended-word, 116 
halfword, 28, 116, 403 
instructions, 28, 116, 403 
integer registers, 266, 268 
memory, 403, 503 
quadword, 28, 116, 403 
word, 28, 116, 403 
ALLCLEAN instruction, 150 
alternate space instructions, 29, 74 
ancillary state registers (ASRs) 
access, 70 
assembly language syntax, 610 
I/O register access, 29 
possible registers included, 304, 378 
privileged, 31, 588 
reading / writing implementation-dependent 
processor registers, 31, 588 
writing to, 377 
AND instruction, 151 
ANDcc instruction, 151 
ANDN instruction, 151 
ANDNcc instruction, 151 
annul bit 
in branch instructions, 162 


in conditional branches, 177 
annulled branches, 162 
application program, 7, 70 
architectural direction note, 5 
architecture, meaning for SPARC V9, 21 
arithmetic overflow, 73 
ARRAY16 instruction, 152 
ARRAY32 instruction, 152 
ARRAYS instruction, 152 


ASI, 7 


invalid, and data access exception, 499 
ASI register, 70 
ASI, See address space identifier (ASI) 
ASI *REAL* ASIs, 404 
ASI AIPN, 442 


ASI AIPN 


L, 442 


ASI AIPP, 442 


ASI AIPP 


_L, 442 


ASI_AIPS, 442 


ASI AIPS 





L, 442 


ASI AIUP, 424,437 


ASI AIU 





"Ww 


L, 424, 438 


ASI AIUS, 424, 437 
ASI AIUS. L, 270 
ASI AIUSL, 424, 438 























ASI AS IF PRIV NUCLEUS, 442 
ASI AS IF PRIV NUCLEUS LI , 442 
ASI AS IF PRIV PRIMARY, 442 
ASI AS IF PRIV PRIMARY LI , 442 


ASI AS IF PRIV. SECONDARY, 442 
























































ASI AS IF PRIV SECONDARY LITTLE, 442 

ASI AS IF USER*, 98, 403, 404 

ASI AS IF USER* ASIs, 403 

ASI AS IF USER NONFAULT LI E, 407 

ASI AS IF USER PRIMARY, 424, 437, 505 

ASI AS IF USER PRIMARY LITTLE, 406, 424, 
438, 498 





ASI AS IF USER SECONDARY, 406, 424, 437, 


498, 505 














ASI AS IF USER SECONDARY LITTLE, 406, 


424, 438, 498 











ASI AS IF USER SECONDARY NOFAULT LITT 


L 


m, 


ASI BLK AIU 


x] 








407 
, 424, 437 


P 
ASI BLK AIUPL, 424,438 
S 


ASI BLK AIU 


, 424, 437 











ASI BLK AIUSL, 424,438 
ASI BLK P, 434 
ASI BLK PL, 434 
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ASI_BLK_S, 434 ASI NUCLEUS. QUAD. LDD. L (deprecated), 447 
ASI BLK SL, 434 ASI NUCLEUS, QUAD. LDD LITTLE 
ASI BLOCK AS IF USER PRIMARY, 424, 437 (deprecated), 447 
ASI BLOCK AS IF USER PRIMARY LITTLE, 4 ASI P, 432 
24, 438 ASI PHY BYPASS EC WITH EBIT. L, 447 
ASI BLOCK AS IF USER SECONDARY, 424, 437 ASI PHYS, BYPASS EC WITH EBIT, 447 
ASI BLOCK AS IF USER SECONDARY LITTLE, ASI PHYS BYPASS EC WITH EBIT LITTLE, 4 
424, 438 47 
ASI BLOCK PRIMARY, 434 ASI PHYS USE EC, 447 
ASI BLOCK PRIMARY LITTLE, 434 ASI PHYS, USE EC L, 447 
ASI BLOCK SECONDARY, 434 ASI PHYS USE EC LITTLE, 447 
ASI BLOCK SECONDARY LITTLE, 434 ASI PL, 432 
ASI CMT PER, CORE, 431 ASI PNE, 432 
ASI CMT PER STRAND, 431, 537, 538 ASI PNFL, 432 
ASI CMT SHARED, 427, 541, 542, 545, 548, 552 ASI PRIMARY, 122, 405, 406, 432 
ASI DEVICE ID-«SERIAL ID, 435 ASI PRIMARY LITTLE, 122, 405, 432 
ASI. DMMU, 431 ASI PRIMARY. NO FAULT, 402, 419, 432, 499 
ASI. DMMU. DEMAP, 431 ASI PRIMARY. NO FAULT LITTLE, 402, 419, 
ASI DTLB DATA, ACCESS, REG, 431 432, 499 
ASI. DTLB. DATA, IN REG, 431 ASI PRIMARY NOFAULT. LITTLE, 407 
ASI DTLB. TAG READ REG, 431 ASI PST16 P, 347,432 
ASI FL16 P, 433 ASI PST16 PL, 347,433 
ASI FL16 PL, 433 ASI PST16 PRIMARY, 432 
ASI FL16 PRIMARY, 433 ASI PST16 PRIMARY LITTLE, 433 
ASI FL16. PRIMARY, LITTLE, 433 ASI PST16,. S, 347,432 
ASI FL16 S, 433 ASI PST16 SECONDARY, 432 
ASI FL16 SECONDARY, 433 ASI PST16 SECONDARY LITTLE, 433 
ASI FL16 SECONDARY LITTLE, 433 ASI PST16 SL, 347 
ASI FL16 SL, 433 ASI PST32. P, 347,432 
ASI FLS8 P, 433 ASI PST32 PL, 347,433 
ASI FL8 PL, 433 ASI PST32 PRIMARY, 432 
ASI FL8 PRIMARY, 433 ASI PST32 PRIMARY LITTLE, 433 
ASI FL8 PRIMARY LITTLE, 433 ASI PST32. S, 347,432 
ASI FLS8,. S, 433 ASI PST32 SECONDARY, 432 
ASI FL8, SECONDARY, 433 ASI PST32 SECONDARY LITTLE, 433 
ASI FL8 SECONDARY LITTLE, 433 ASI PST32 SL, 347,433 
ASI FLS8 SL, 433 ASI PST8. P, 432 
ASI IMMU, 428 ASI PST8 PL, 432 
ASI, IMMU. DEMAP, 431 ASI PST8, PRIMARY, 432 
ASI ITLB DATA, ACCESS, REG, 430 ASI PST8. PRIMARY, LITTLE, 432 
ASI ITLB TAG READ REG, 430 ASI PST. S, 432 
ASI MMU, 430 ASI PST8, SECONDARY, 432 
ASI MMU CONTEXTID, 425 ASI PST8, SECONDARY LITTLE, 432 
ASI, MMU REAL, 428 ASI PST8, SL, 347, 432 
ASI N, 423 ASI QUAD. LDD. L (deprecated), 447 
ASI NL, 424 ASI QUAD. LDD LITTLE (deprecated), 447 
ASI, NUCLEUS, 122, 423 ASI QUAD. LDD. PHYS (deprecated), 447 
ASI NUCLEUS, LITTLE, 122,424 ASI QUAD  LDD. REAL (deprecated), 426 
ASI NUCLEUS. QUAD. LDD (deprecated), 447 ASI QUAD LDD REAL LITTLE (deprecated), 426 
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ASI_REAL, 424, 438, 447 
ASI. REAL IO, 424, 438, 447 
ASI REAL IO L, 424 

ASI REAL IO LITTLE, 424, 439, 447 
ASI REAL L, 424 

ASI REAL LITTLE, 424, 439, 447 
ASI S, 432 
ASI SECONDARY, 432 

ASI SECONDARY LITTLE, 432 

ASI SECONDARY NO FAULT, 419, 432, 499 









































ASI SECONDARY NO FAULT. LITTLE, 419,432, 








499 
ASI SECONDARY NOFAULT, 407 
ASI SL, 432 
ASI SNF, 432 

















































































































ASI_SNFL, 432 

ASI TWINX  AIUP, 270, 425, 440 

ASI TWINX, AIUP. L, 270, 440 

ASI TWINX AIUPL, 426 

ASI TWINX AIUS, 270, 440 

ASI TWINX, AIUS. L, 426, 440 

ASI TWINX AS, IF USER PRIMARY, 425, 440 

ASI TWINX AS IF USER PRIMARY LITTLE, 4 
26, 440 

ASI TWINX AS, IF USER, SECONDARY, 425,440 

ASI TWINX AS IF USER SECONDARY LITTLE, 
426, 440 

ASI TWINX, N, 270, 426, 447 

ASI TWINX NL, 270, 427, 440, 447 

ASI TWINX NUCLEUS, 426,440, 447 

ASI TWINX NUCLEUS[. L], 403 

ASI TWINX NUCLEUS LITTLE, 427, 440, 447 

ASI TWINX, P, 270,434 

ASI TWINX PL, 270,434 

ASI TWINX PRIMARY, 434,443 

ASI TWINX PRIMARY LITTLE, 434,443 

ASI TWINX, R, 426, 441, 447 

ASI TWINX REAL, 270, 426, 441, 447 

ASI TWINX REAL[ L], 403 

ASI TWINX REAL L, 426,441, 447 

ASI TWINX REAL LITTLE, 426, 441, 447 

ASI TWINX, S, 270,434 

ASI TWINX SECONDARY, 424, 443 

ASI TWINX SECONDARY LITTLE, 434, 443 

ASI TWINX, SL, 270, 434 

ASI | UMMU, 431 

ASR, 7 

asr reg, 610 

atomic 


memory operations, 271, 414, 416 
store doubleword instruction, 352, 354 
store instructions, 331, 332 

atomic load-store instructions 
compare and swap, 165 
load-store unsigned byte, 262, 361 


load-store unsigned byte to alternate space, 263 


simultaneously addressing doublewords, 360 


swap R register with alternate space 
memory, 361 
swap R register with memory, 165, 360 
atomicity, 402, 595 
available (core), 7 


B 

BA instruction, 156, 157, 581 

BCC instruction, 156, 581 

bclrg synthetic instruction, 618 

BCS instruction, 156, 581 

BE instruction, 156, 581 

Berkeley RISCs, 24 

BG instruction, 156, 581 

BGE instruction, 156, 581 

BGU instruction, 156, 581 

Bicc instructions, 156, 575 

big-endian, 7 

big-endian byte order, 28, 96, 117 
in hyperprivileged mode, 468 

binary compatibility, 24 

BL instruction, 581 

BLD, 7 

BLD, See LDBLOCKF instruction 

BLE instruction, 156, 581 

BLEU instruction, 156, 581 

block load instructions, 56, 247, 443 

block store instructions, 56, 335, 443 

blocked byte formatting, 153 

BMASK instruction, 158 

BN instruction, 156, 581 

BNE instruction, 156, 581 

BNEG instruction, 156, 581 

BP instructions, 581 

BPA instruction, 159, 581 

BPCC instruction, 159, 581 

BPcc instructions, 73, 74, 159, 582 

BPCS instruction, 159, 581 

BPE instruction, 159, 581 

BPG instruction, 159, 581 


Index 


BPGE instruction, 159, 581 
BPGU instruction, 159, 581 
BPL instruction, 159, 581 
BPLE instruction, 159, 581 
BPLEU instruction, 159, 581 
BPN instruction, 159, 581 
BPNE instruction, 159, 581 
BPNEG instruction, 159, 581 
BPOS instruction, 156, 581 
BPPOS instruction, 159, 581 
BPr instructions, 162, 581 
BPVC instruction, 159, 581 
BPVS instruction, 159, 581 
branch 

annulled, 162 

delayed, 113 

elimination, 130 

fcc-conditional, 177, 179 

icc-conditional, 157 

instructions 

on floating-point condition codes, 176 
on floating-point condition codes with 
prediction, 178 


on integer condition codes with prediction 


(BPcc), 159 


on integer condition codes, See Bicc instruc- 


tions 
when contents of integer register match 
condition, 162 
prediction bit, 162 
unconditional, 156, 160, 176, 179 
with prediction, 22 
BRGEZ instruction, 162 
BRGZ instruction, 162 
BRLEZ instruction, 162 
BRLZ instruction, 162 
BRNZ instruction, 162 
BRZ instruction, 162 
bset synthetic instruction, 618 
BSHUFFLE instruction, 158 
BST, 7 
BST, See STBLOCKF instruction 
btog synthetic instruction, 618 
btst synthetic instruction, 618 
BVC instruction, 156, 581 
BVS instruction, 156, 581 
byte, 7 
addressing, 122 
data format, 35 


order, 28 
order, big-endian, 28 
order, little-endian, 28 
byte order 
big-endian, 96 
in hyperprivileged mode, 468 
implicit, 96 
in trap handlers, 468 
little-endian, 96 


C 


cache 
coherency protocol, 401 
data, 409 
instruction, 409 
miss, 301 
nonconsistent instruction cache, 409 
cacheable accesses, 400 
caching, TSB, 525 
CALL instruction 
description, 164 
displacement, 30, 31 
does not change CWP, 53 
and JMPL instruction, 241 
writing address into R[15], 55 
call synthetic instruction, 616 
CANRESTORE (restorable windows) register, 88 
and clean window exception, 131 
and CLEANWIN register, 88, 90, 507 
counting windows, 90 
decremented by RESTORE instruction, 309 
decremented by SAVED instruction, 319 
detecting window underflow, 53 
if registered window was spilled, 310 
incremented by SAVE instruction, 317 
modified by NORMALW instruction, 289 
modified by OTHERW instruction, 291 
range of values, 86, 596 
RESTORE instruction, 131 
specification for RDPR instruction, 307 
specification for WRPR instruction, 382 
state after reset, 568 
window underflow, 507 
CANSAVE (savable windows) register, 87 
decremented by SAVE instruction, 317 
detecting window overflow, 53 
FLUSHW instruction, 192 
if equals zero, 131 


Index 


incremented by RESTORE, 309 
incremented by SAVED instruction, 319 
range of values, 86, 596 
SAVE instruction, 508 
specification for RDPR instruction, 307 
specification for WRPR instruction, 382 
state after reset, 568 
window overflow, 507 
CAS synthetic instruction, 416 
CASA instruction, 165 
32-bit compare-and-swap, 415 
alternate space addressing, 29 
and data access exception (noncacheable page) 
exception, 499 
atomic operation, 262 
hardware primitives for mutual exclusion of 
CASXA, 414 
in multiprocessor system, 263, 360, 361 
R register use, 115 
word access (memory), 116 
casn synthetic instructions, 618 
CASX synthetic instruction, 415, 416 
CASXA instruction, 165 
64-bit compare-and-swap, 415 
alternate space addressing, 29 
and data access exception (noncacheable page) 
exception, 499 
atomic operation, 263 
doubleword access (memory), 116 
hardware primitives for mutual exclusion of 
CASA, 414 
in multiprocessor system, 262, 263, 360, 361 
R register use, 115 
catastrophic error exception, 454 
CCO instruction field 
branch instructions, 159, 179 
floating point compare instructions, 183 
move instructions, 281, 582 
CCÍ instruction field 
branch instructions, 159, 179 
floating point compare instructions, 183 
move instructions, 281, 582 
CC2 instruction field 
move instructions, 281, 582 
CCR (condition codes register), 8 
CCR (condition codes) register, 72 
32-bit operation (icc) bit of condition field, 73, 74 
64-bit operation (xcc) bit of condition field, 73, 
74 


ADD instructions, 148 
ASR for, 70 
carry (C) bit of condition fields, 73 
icc field, See CCR.icc field 
MULScc instruction, 285 
negative (n) bit of condition fields, 73 
overflow bit (V) in condition fields, 73 
restored by RETRY instruction, 168, 313 
saved after trap, 453 
saving after trap, 33 
state after reset, 567 
TSTATE register, 93 
write instructions, 377 
xcc field, See CCR.xcc field 
zero (Z) bit of condition fields, 73 
CCR.icc field 
add instructions, 148, 363 
bit setting for signed division, 322 
bit setting for signed /unsigned multiply, 329, 
374 
bit setting for unsigned division, 373 
branch instructions, 157, 160, 281 
integer subtraction instructions, 359 
logical operation instructions, 151, 290, 385 
MULScc instruction, 285 
Tece instruction, 367 
CCR.xcc field 
add instructions, 148, 363 
bit setting for signed/unsigned divide, 322, 373 
bit setting for signed/unsigned multiply, 329, 
374 
branch instructions, 160, 281 
logical operation instructions, 151, 290, 385 
subtract instructions, 359 
Tec instruction, 367 
clean register window, 317, 498 
clean window, 8 
and window traps, 90, 506 
CLEANWIN register, 90 
definition, 507 
number is zero, 131 
trap handling, 509 
clean window exception, 88, 131, 318, 473, 498, 507, 
591 
CLEANWIN (clean windows) register, 88 
CANSAVE instruction, 131 
clean window counting, 88 
incremented by trap handler, 509 
range of values, 86, 596 


Index vi 


specification for RDPR instruction, 307 
specification for WRPR instruction, 382 
specifying number of available clean 
windows, 507 
state after reset, 568 
value calculation, 90 
clock cycle, counts for virtual processor, 75 
clock tick registers, See TICK and STICK registers 
clock-tick register (TICK), 504 
clrn synthetic instructions, 618 
CMP 
disabling a core, 541 
parking a core, 544 
cmp synthetic instruction, 359, 616 
CMT, 8, 529, 532 
enabling a core, 541 
ERROR STEERING register, 554, 556 
Programming Model, 530 
registers, 536 
STRAND AVAILABLE register, 537, 541 
STRAND ENABLE register, 542 
STRAND ENABLE STATUS register, 542 
STRAND D register, 537 
STRAND INTR ID register, 516, 538 
STRAND RUNNING register 
simultaneous updates, 546 
STRAND RUNNING register, 545 
STRAND RUNNING STATUS, 548 
unparking a core, 544 
XIR. STEERING register, 552 
code 
self-modifying, 416 
coherence, 8 
between processors, 595 
data cache, 409 
domain, 401 
memory, 402 
unit, memory, 403 
compare and swap instructions, 165 
comparison instruction, 124, 359 
compatibility note, 5 
completed (memory operation), 8 
compliant SPARC V9 implementation, 25 
cond instruction field 
branch instructions, 157, 159, 177, 179 
floating point move instructions, 195 
move instructions, 281 
condition codes 
adding, 363 





effect of compare-and-swap instructions, 166 
extended integer (xcc), 74 
floating-point, 177 
icc field, 73 
integer, 72 
results of integer operation (icc), 74 
subtracting, 359, 369 
trapping on, 367 
xcc field, 73 
condition codes register, See CCR register 
conditional branches, 157, 177, 179 
conditional move instructions, 32 
conforming SPARC V9 implementation, 25 
consistency 
between instruction and data spaces, 416 
processor, 409, 413 
processor self-consistency, 412 
sequential, 402, 410, 411 
strong, 411 
const22 instruction field of ILLTRAP 
instruction, 237 
constants, generating, 323 
context, 8 
nucleus, 190 
context identifier, 404 
control transfer 
pseudo-control-transfer via WRPR to 
PSTATE.am, 99 
control-transfer instructions, 30 
control-transfer instructions (CTIs), 30, 168, 313 
conventions 
font, 2 
notational, 3 
conversion 
between floating-point formats instructions, 233 
floating-point to integer instructions, 231, 389 
integer to floating-point instructions, 187, 236 
planar to packed, 221 
copyback, 8 
core, 8 
CPI, 8 
CPU, pipeline draining, 87, 91 
cpu mondo exception, 498 
cross-call, 8 
CTI, 8,17 
current exception (cexc) field of FSR register, 67, 
134, 588 
current window, 8 
current window pointer register, See CWP register 
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current little endian (cle) field of PSTATE 
register, 96, 405 
CWP (current window pointer) register 
and instructions 
CALL and JMPL instructions, 53 
FLUSHW instruction, 192 
RDPR instruction, 307 
RESTORE instruction, 131, 309 
SAVE instruction, 131, 309, 317 
WRPR instruction, 382 
and traps 
after spill trap, 508 
after spill/fill trap, 33 
on window trap, 508 
saved by hardware, 453 
CWP (current window pointer) register, 87 
clean windows, 88 
definition, 8 
incremented / decremented, 52, 309, 317 
overlapping windows, 52 
range of values, 86, 596 
restored during RETRY, 168, 313 
specifying windows for use without 
cleaning, 507 
state after reset, 567 
and TSTATE register, 93 
updated during a WDR reset, 566 


D 
D superscript on instruction name, 138 
d16hi instruction field 
branch instructions, 162 
d16lo instruction field 
branch instructions, 162 
data 
access, 8 
cache coherence, 409 
conversion between SIMD formats, 43 
flow order constraints 
memory reference instructions, 408 
register reference instructions, 408 
formats 
byte, 35 
doubleword, 35 
halfword, 35 
Int16 SIMD, 44 
Int32 SIMD, 44 
quadword, 35 


tagged word, 35 
Uint8 SIMD, 44 
word, 35 
memory, 418 
types 
floating-point, 35 
signed integer, 35 
unsigned integer, 35 
width, 35 
Data Cache Unit Control register, See DCUCR 
Data Synchronous Fault Address register, See D- 
SFAR 
Data Synchronous Fault Status register, See D-SFSR 
data_access_error exception, 498 
with load instructions, 272 
data_access_exception (invalid ASI) exception 
with load alternate instructions, 245 
data_access_exception exception, 498 
register update policy, 526 
with compare-and-swap instructions, 167 
with LD instructions, 243 
with LDSHORTF instructions, 246, 249 
with LDTXA instructions, 272 
with load instructions, 252, 266, 269, 274 
with load instructions and ASIs, 256, 440, 441, 
443, 444, 445 
with store instructions and ASIs, 256, 440, 441, 
443, 444, 445 
with STPARTIALF instructions, 349 
with SWAPA instruction, 362 
data access MMU error exception 
on PREFETCH, 297, 302 
with CASA instruction, 167 
with load instructions, 243, 246, 250, 253, 257, 
259, 262, 264, 266, 269, 272, 274 
with store instructions, 331, 349, 351, 353, 356 
with SWAP instruction, 360, 362 
data access MMU miss exception 
with integer load instructions, 243 
with load alternate instructions, 246 
with load instructions, 250 
with PREFETCH instruction, 297 
data access protection exception 
(superseded), 506 
data invalid TSB entry exception, 499 
data real translation miss exception, 500 
DCTI couple, 129 
DCTI instructions, 8 
behavior, 113 
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RETURN instruction effects, 315 
dec synthetic instructions, 618 
deccc synthetic instructions, 618 
deferred trap, 461 

distinguishing from disrupting trap, 463 

floating-point, 308 

restartable 

implementation dependency, 463 

software actions, 463 
delay instruction 

and annul field of branch instruction, 177 

annulling, 30 

conditional branches, 179 

DONE instruction, 168 

executed after branch taken, 162 

following delayed control transfer, 30 

RETRY instruction, 313 

RETURN instruction, 315 

unconditional branches, 179 

with conditional branch, 160 
delayed branch, 113 
delayed control transfer, 162 
delayed CTI, See DCTI 
demap, 8 
denormalized number, 8 
deprecated, 9 
deprecated exceptions 

tag overflow, 505 
deprecated instructions 

FBA, 176 

FBE, 176 

FBG, 176 

FBGE, 176 

FBL, 176 

FBLE, 176 

FBLG, 176 

FBN, 176 

FBNE, 176 

FBO, 176 

FBU, 176 

FBUE, 176 

FBUGE, 176 

FBUL, 176 

FBULE, 176 

LDFSR, 258 

LDTW, 265 

LDTWA, 267 

MULScc, 72, 285 

RDY, 70, 72, 303 





SDIV, 72, 321 

SDIVcc, 72, 321 

SMUL, 72, 329 

SMULcc, 72, 329 

STFSR, 345 

STTW, 352 

STTWA, 354 

SWAP, 360 

SWAPA, 361 

TADDccTV, 364 

TSUBccTV, 370 

UDIV, 72, 372 

UDIVcc, 72, 372 

UMUL, 72, 374 

UMULcc, 72, 374 

WRY, 70, 72, 376 
dev mondo exception, 500 
disable (core), 9 
disabled (core), 9 
disabling CMP core, 541 
disp19 instruction field 

branch instructions, 159, 179 
disp22 instruction field 

branch instructions, 156, 177 
disp30 instruction field 

word displacement (CALL), 164 
disrupting trap, 463 

differences from reset trap, 466 
divide instructions, 30, 287, 321, 372 
division by zero exception, 125, 287, 500 
division-by-zero bits of FSR.aexc/FSR.cexc 

fields, 69 
DMMU Tag Access register 


context field after data access exception, 526 


DONE instruction, 168 
effect on HTSTATE, 107 
effect on TNPC register, 92 
effect on TSTATE register, 94 
executed in RED state, 458 
generating illegal instruction exception, 502 
modifying CCR.xcc condition codes, 73 
return from trap, 453 
return from trap handler with different GL 
value, 104 
target address, 31 
doubleword, 9 
addressing, 120 
alignment, 28, 116, 403 
data format, 35 
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definition, 9 
D-SFAR 

state after reset, 569 
D-SFSR register, See SFSR register 
DuTLB, disabled, 499 


E 
EDGE16 instruction, 170 
EDGE16L instruction, 170 
EDGEI6LN instruction, 172 
EDGEI6N instruction, 172 
EDGE32 instruction, 170 
EDGE32L instruction, 170 
EDGE32LN instruction, 172 
EDGE32N instruction, 172 
EDGES instruction, 170 
EDGESL instruction, 170 
EDGESLN instruction, 172 
EDGESN instruction, 172 
emulating multiple unsigned condition codes, 130 
enable (core), 9 
enable floating-point 
See FPRS register, fef field 
See PSTATE register, pef field 
enabled (core), 9 
enabling CMP core, 541 
error state, 460 
definition, 456 
effects when entering, 590 
entering, 486, 488, 494, 495, 496 
exiting, 486 
recognizing interrupts, 487 
and RED state, 457 
ERROR STEERING register, 556 
even parity, 9 
exception, 9 
exceptions 
See also individual exceptions 
catastrophic error, 454 
causing traps, 453 
clean window, 473, 498, 591 
cpu mondo, 498 
data access error, 498 
data access exception, 498 
data access MMU error 
on PREFETCH, 297 
on PREFETCH, 297 
data access MMU error, 167, 243, 246, 250, 253, 





257, 259, 262, 264, 266, 269, 272, 274, 302, 331, 
349, 351, 353, 356, 360 
data access MMU miss, 297 
data access protection (superseded), 506 
data invalid TSB entry, 499 
data real translation miss, 500 
definition, 454 
dev mondo, 500 
division by zero, 500 
fast data access MMU miss, 297, 500 
fast data access protection, 500 
fast ECC error, 506 
fill n normal, 500 
fill n other, 500 
fp disabled 
and GSR, 80 
fo disabled, 500 
fo exception ieee 754, 500 
fo exception other, 501 
guest watchdog, 501 
hstick match, 107, 110, 501 
htrap instruction, 501 
illegal instruction 
and SIR instruction, 466 
illegal instruction, 105, 501 
instruction access error, 502 
instruction access exception, 502, 502 
instruction breakpoint, 502 
instruction invalid TSB entry, 502 
instruction real translation miss, 502 
internal processor error, 502. 
interrupt level 14 
and SOFTINT.int_ level, 82 
and STICK CMPR.stick cmpr, 86 
and TICK. CMPR.tick cmpr, 84 
interrupt level 14, 503 
interrupt level 15 
and SOFTINT.int_ level, 82 
interrupt level 15, 503 
interrupt level n 
and SOFTINT register, 81 
and SOFTINT.int_ level, 82 
interrupt level n, 464, 503 
LDDF mem address not aligned, 503 
LDQF mem adaress not aligned, 506 
mem address not aligned, 503 
nonresumable error, 503 
PA watchpoint, 503 
pending, 33 
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pic overflow, 503 
power on reset, 503 
privileged action, 503 
privileged opcode 
and access to register-window PR state 
registers, 86, 91, 101, 104 
and access to SOFTINT, 81 
and access to SOFTINT CLR, 83 
and access to SOFTINT SET, 82 
and access to STICK. CMPR, 85 
and access to TICK CMPR, 83 
privileged opcode, 504 
RA watchpoint, 477, 483 
RED state exception, 504 
resumable error, 504 
software initiated reset, 504 
Spill n normal, 318, 504 
Spill n other, 318, 504 
STDF mem adaress not aligned, 504 
store error, 504 
STQF mem address not aligned, 506 
tag overflow (deprecated), 505 
trap instruction, 505 
trap level zero 
state after reset, 567 
trap level zero, 505 
unimplemented LDTW, 505 
unimplemented STTW, 505 
VA watchpoint, 505 
watchdog reset 
and guest watchdog, 455 
watchdog reset, 505 
window fill, 473 
window spill, 473 
execute unit, 407 
execute state 
and error state, 488 
and RED state, 488 
returning to, 486 
trap processing, 456, 486 
explicit ASI, 9, 122, 423 
extended word, 9 
addressing, 120 
externally initiated reset (XIR), 494, 500, 565 
causing entry into RED state, 457 
and error, state, 472 
for critical system events, 466 
for debugging, 459 
partial-processor, 551 

















RED. state trap processing, 490 
to virtual processor, 565 


F 

F registers, 10, 26, 133, 387, 468 

FABSd instruction, 173, 579, 580 

FABSq instruction, 173, 579, 580 

FABSs instruction, 173 

FADD, 174 

FADDd instruction, 174 

FADDJQ instruction, 174 

FADDs instruction, 174 

FALIGNDATA instruction, 175 

FAND instruction, 229 

FANDNOT1 instruction, 229 

FANDNOTIS instruction, 229 

FANDNOT? instruction, 229 

FANDNOT72S instruction, 229 

FANDS instruction, 229 

fast_data_access_MMU_miss exception, 500 
register update policy, 526 
with integer load instructions, 243 
with load alternate instructions, 246 
with PREFETCH instruction, 297 

fast_data_access_protection exception, 500 
register update policy, 526 
write permission not granted, 524 

fast ECC. error exception, 506 

fast instruction access MMU miss exception 
register update policy, 526 

FBA instruction, 176, 177, 581 

FBE instruction, 176, 581 

FBfcc instructions, 61, 176, 500, 575, 581 

FBG instruction, 176, 581 

FBGE instruction, 176, 581 

FBL instruction, 176, 581 

FBLE instruction, 176, 581 

FBLG instruction, 176, 581 

FBN instruction, 176, 581 

FBNE instruction, 176, 581 

FBO instruction, 176, 581 

FBPA instruction, 178, 179, 581 

FBPE instruction, 178, 581 

FBPfcc instructions, 61, 178, 575, 581, 582 

FBPG instruction, 178, 581 

FBPGE instruction, 178, 581 

FBPL instruction, 178, 581 

FBPLE instruction, 178, 581 
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FBPLG instruction, 178, 581 


FBPN instruction, 178, 179, 581 


FBPNE instruction, 178, 581 
FBPO instruction, 178, 581 


FBPU 
FBPU 
FBPU 
FBPU 
FBPU 





FBPU 


instruction, 178, 581 

E instruction, 178, 581 
G instruction, 178, 581 
GE instruction, 178, 581 
L instruction, 178, 581 
LE instruction, 178, 581 


FBU instruction, 176, 581 
FBUE instruction, 176, 581 
FBUG instruction, 176, 581 
FBUGE instruction, 176, 581 
FBUL instruction, 176, 581 
FBULE instruction, 176, 581 


fcc-conditional branches, 177, 179 


fccn, 


9 


FCMP instructions, 582 


FCMP* instructions, 61, 62, 183 


FCMPd instruction, 183, 580 
FCMPE instructions, 582 


FCMPE* instructions, 61, 62, 183 
FCMPEd instruction, 183, 580 
FCMPEq instruction, 183, 580 


FCMPEQ16 instruction, 180 
FCMPEQ32 instruction, 180 
FCMPEs instruction, 183, 580 
FCMPGT instruction, 180 
FCMPGT16 instruction, 180 
FCMPGT32 instruction, 180 
FCMPLE16 instruction, 180 
FCMPLE16 instruction, 180 
FCMPLE32 instruction, 180 
FCMPLE32 instruction, 180 


FCMPNE16 instruction, 180, 181 
FCMPNE32 instruction, 180, 181 


FCMPa instruction, 183, 580 
FCMPs instruction, 183, 580 
fcn instruction field 

DONE instruction, 168 

PREFETCH, 295 

RETRY instruction, 313 
FDIVd instruction, 185 
FDIVq instruction, 185 
FDIVs instructions, 185 
FdMULzg instruction, 209 
FdTOi instruction, 231, 389 
FdTOq instruction, 233 





FdTOs instruction, 233 
FdTOx instruction, 231, 580 
fef field of FPRS register, 77 
and access to GSR, 80 
and fp disabled exception, 500 
branch operations, 177, 179 
byte permutation, 158 
comparison operations, 181, 184 
data movement operations, 282 
enabling FPU, 97 
floating-point operations, 173, 174, 185, 187, 193, 
198, 201, 209, 211, 230, 231, 233, 235, 236, 251, 
254, 258, 260, 273 
integer arithmetic operations, 220, 225 
logical operations, 226, 227, 229 
memory operations, 249 
read operations, 305, 325, 337 
special addressing operations, 149, 175, 339, 345, 
349, 351, 357, 378 
fef, See FPRS register, fef field 
FEXPAND instruction, 186 
FEXPAND operation, 186 
fill handler, 310 
fill register window, 500 
overflow /underflow, 53 
RESTORE instruction, 90, 309, 507 
RESTORED instruction, 132, 311, 509 
RETRY instruction, 508 
selection of, 507 
trap handling, 508 
trap vectors, 310 
window state, 90 
fill n normal exception, 310, 316, 500, 500 
fill n other exception, 310, 316, 500 
FiTOd instruction, 187 
FiTOq instruction, 187 
FiTOs instruction, 187 
fixed values, 238 
fixed-point scaling, 204 
floating point 
absolute value instructions, 173 
add instructions, 174 
compare instructions, 61, 62, 183, 183 
condition code bits, 177 
condition codes (fcc) fields of FSR register, 64, 
177, 179, 183 
data type, 35 
deferred-trap queue (FQ), 308 
divide instructions, 185 
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exception, 10 

exception, encoding type, 63 
FPRS register, 377 

FSR condition codes, 62 
move instructions, 193 
multiply instructions, 209 
negate instructions, 211 


operate (FPop) instructions, 10, 32, 63, 67, 133, 


258 
registers 

destination F, 387 

FPRS, See FPRS register 

FSR, See FSR register 

programming, 59 
rounding direction, 62 
square root instructions, 230 
subtract instructions, 235 
trap types, 10 


IEEE 754 exception, 64, 65, 67, 70, 387, 388 


invalid, fp register, 173, 174 
unfinished, FPop, 64, 65, 70, 174, 185, 210, 
234, 235, 388 
results after recovery, 65 


unimplemented FPop, 65, 70, 173, 174, 184, 
185, 187, 193, 199, 202, 210, 211, 232, 234, 


235, 388 
traps 
deferred, 308 
precise, 308 
floating-point condition codes (fcc) fields of FSR 
register, 468 


floating-point operate (FPop) instructions, 500, 501 


floating-point trap types 

IEEE 754 exception, 468, 500, 501 
floating-point unit (FPU), 10, 26 
FLUSH instruction, 189 

memory ordering control, 277 
FLUSH instruction 

memory /instruction synchronization, 188 
FLUSH instruction, 188, 418 

data access, 8 

immediacy of effect, 190 

in multiprocessor system, 188 

in self-modifying code, 189 

latency, 595 
flush instruction memory, See FLUSH instruction 
flush register windows instruction, 192 
FLUSHW instruction, 192, 504 

effect, 32 
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management by window traps, 90, 506 
spill exception, 133, 192, 508 
MOVcc instructions 
conditionally moving floating-point register 
contents, 74 
conditions for copying floating-point register 
contents, 129 
copying a register, 61 
encoding of opf<84> bits, 580 
encoding of opf cc instruction field, 582 
encoding of rcond instruction field, 581 
floating-point moves, 195 
FPop instruction, 133 
used to avoid branches, 199, 281 
MOVccd instruction, 580 
MOVccq instruction, 580 
MOVd instruction, 193, 579, 580 
MOVDfcc instructions, 195 
MOVdGEZ instruction, 200 
MOVdGZ instruction, 200 
MOVDicc instructions, 195 
MOVIdLEZ instruction, 200 
MOVdLZ instruction, 200 
MOVANZ instruction, 200 
MOVdZ instruction, 200 
MOVa instruction, 193, 579, 580 
MOVQfcc instructions, 195, 198 
MOVqGEZ instruction, 200 
MOVqGZ instruction, 200 
MOVQicc instructions, 195, 198 
MOVQqLEZ instruction, 200 
MOVQqLZ instruction, 200 
MOVQqNZ instruction, 200 
MOVqZ instruction, 200 
MOVr instructions, 134, 581 
MOVRa instructions, 201 
MOVRsGZ instruction, 200 
MOVRSsLEZ instruction, 200 
MOVRSLZ instruction, 200 
MOVRsNZ instruction, 200 
MOVRSsZ instruction, 200 
MOVs instruction, 193 
MOVScc instructions, 197 
MOVSfcc instructions, 195 
MOVSGEZ instruction, 200 
MOVSicc instructions, 195 
MOVSxcc instructions, 195 
MOV Xxcc instructions, 195, 198 
MULSSUX16 instruction, 203, 206 
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L8ULx16 instruction, 203, 206 
L8x16 instruction, 203, 204 
L8x16AL instruction, 203, 205 
L8x16AU instruction, 203, 205 
Ld instruction, 209 

LD8SUx16 instruction, 203, 207 
LD8ULx16 instruction, 203, 208 
Lq instruction, 209 

Ls instruction, 209 
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NAND instruction, 229 
NANDS instruction, 229 

NEG instructions, 211 

NEGd instruction, 211, 579, 580 
NEGq instruction, 211, 579, 580 
NEGs instruction, 211 

NOR instruction, 229 

NORS instruction, 229 

NOTI instruction, 227 

NOTIS instruction, 227 

NOT? instruction, 227 

NOT?2S instruction, 227 


FONE instruction, 226 
FONES instruction, 226 
FOR instruction, 229 
formats, instruction, 114 
FORNOT!I instruction, 229 
FORNOT!IS instruction, 229 
FORNOT?2 instruction, 229 
FORNOT2S instruction, 229 
FORS instruction, 229 

fo disabled exception, 500 


absolute value instructions, 173, 174, 235 

and GSR, 80 

FPop instructions, 134 

FPRS.fef disabled, 77 

PSTATE.pef not set, 77, 78 

with branch instructions, 177, 179 

with compare instructions, 182 

with conversion instructions, 187, 232, 234, 236 

with floating-point arithmetic instructions, 185, 
210, 220, 225 

with FMOV instructions, 193 

with load instructions, 257 

with move instructions, 199, 202, 282 

with negate instructions, 211 

with store instructions, 339, 340, 343, 345, 346, 
349, 351, 357, 358, 378 


fo exception exception, 67 
fo exception ieee 754 "invalid" exception, 231 


fp exception ieee 754 exception, 500 


and tem bit of FSR, 63 

cause encoded in FSR.ftt, 64 

FSR.aexc, 67 

FSR.cexc, 68 

FSR.ftt, 67 

generated by FCMP or FCMPE, 62 

and IEEE 754 overflow / underflow 
conditions, 67, 68 

trap handler, 388 

when FSR.tem - 0, 468 

when FSR.tem -1, 468 

with floating-point arithmetic instructions, 174, 
185, 210, 235 


fp exception other exception, 70, 501 


absolute value instructions, 173 

cause encoded in FSR.ftt, 64 

FADDJ instruction, 174 

FCMP{E}q instructions, 184 

FDIVq instruction, 185 

FdTOq, FqTOd instructions, 234 

FiTOq instruction, 187 

FMOVcc instruction, 199 

FMOVa instruction, 193 

FMOVRa instruction, 202 

FMULQq, FdMULa instructions, 210 

FNEGgq instruction, 211 

FqTOx, FqTOi instructions, 232 

FSORT instructions, 230 

FSUBg instruction, 235 

FxTOq instruction, 236 

incorrect IEEE Std 754-1985 result, 134, 587 

occurrence, 147 

supervisor handling, 388 

trap type of unfinished. FPop, 65 

unimplemented  FPop for quad FPops, 60 

when quad FPop unimplemented in 
hardware, 66 

with floating-point arithmetic instructions, 185, 
210 


FPACK instruction, 81 

FPACK instructions, 212-216 
FPACK16 instruction, 212, 213 
FPACK16 operation, 213 
FPACK32 instruction, 212, 214 
FPACK32 operation, 214 
FPACKFIX instruction, 212, 216 
FPACKFIX operation, 216 
FPADD16 instruction, 218 
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FPADD16S instruction, 218 
FPADD32 instruction, 218 
FPADD32S instruction, 218 
FPMERGE instruction, 221 
FPop, 10 
FPop instruction 
unimplemented, 501 
FPop, See floating-point operate (FPop) instructions 
FPRS register 
See also floating-point registers state (FPRS) 
register 
FPRS register, 76, 77 
ASR summary, 71 
definition, 10 
fef field, 134, 467 
RDFPRS instruction, 304 
state after reset, 568 
FPRS register fields 
dl (dirty lower fp registers), 77 
du (dirty upper fp registers, 77 
fef, 77 
fef, See also fef field of FPRS register 
FPSUB16 instruction, 223 
FPSUB16S instruction, 223 
FPSUB32 instruction, 223 
FPSUB32S instruction, 223 
FPU, 9,10 
FqTOd instruction, 233 
FqTOi instruction, 231, 389 
FqTOs instruction, 233 
FqTOx instruction, 231, 579, 580 
freg, 610 
FsMULd instruction, 209 
FSORTd instruction, 230 
FSQRTq instruction, 230 
FSORTS instruction, 230 
FSR (floating-point state) register 
fields 
aexc (accrued exception), 64, 65, 66, 67, 387 
aexc (accrued exceptions) 
in user-mode trap handler, 388 
-- dza (division by zero) bit of aexc, 69 
-- nxa (rounding) bit of aexc, 70 
cexc (current exception), 62, 64, 65, 67, 67, 68, 
387, 500 
cexc (current exceptions) 
in user-mode trap handler, 388 
-- dzc (division by zero) bit of cexc, 69 
-- nxc (rounding) bit of cexc, 70 


fcc (condition codes), 61, 64, 65, 388, 611 
fccn, 62 
ftt (floating-point trap type), 61, 63, 67, 134, 
273, 345, 357, 500, 501 
in user-mode trap handler, 388 
not modified by LDFSR/LDXFSR 
instructions, 61 
ns (nonstandard mode), 61, 258, 273 
qne (queue not empty), 61, 66, 258, 273 
in user-mode trap handler, 388 
rd (rounding), 62 
tem (trap enable mask), 62, 66, 68, 389, 390, 
500 
ver, 63 
ver (version), 61, 273 
FSR (floating-point state) register, 61 
after floating-point trap, 387 
compliance with IEEE Std 754-1985, 70 
LDFSR instruction, 258 
reading/ writing, 61 
state after reset, 568 
values in ftt field, 64 
writing to memory, 345, 357 
FSRCI instruction, 227 
FSRCIS instruction, 227 
FSRC2 instruction, 227 
FSRC2S instruction, 227 
FsTOd instruction, 233 
FsTOi instruction, 231, 389 
FsTOq instruction, 233 
FsTOx instruction, 231, 579, 580 
FSUBd instruction, 235 
FSUBg instruction, 235 
FSUBs instruction, 235 
functional choice, implementation-dependent, 587 
FXNOR instruction, 229 
FXNORS instruction, 229 
FXOR instruction, 229 
FXORS instruction, 229 
FxTOd instruction, 236, 580 
FxTOq instruction, 236, 580 
FxTOs instruction, 236, 580 
FZERO instruction, 226 
FZEROS instruction, 226 


G 


general status register, See GSR (general status) 
register 
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generating constants, 323 
GL register, 102 
access, 103 
during resets, 104 
during trap processing, 486 
function, 102 
reading with RDPR instruction, 307, 382 
relationship to TL, 103 
restored during RETRY, 168, 313 
SPARC V9 compatibility, 100 
and TSTATE register, 93 
value restored from TSTATE[TL], 104 
value restored from TSTATE[TL], 103, 168, 313 
and VER.maxgl, 109 
writing to, 103 
global level register, See GL register 
global registers, 22, 26, 49, 49, 49, 587 
graphics status register, See GSR (general status) 
register 
GSR (general status) register 
fields 
align, 81 
im (interval mode) field, 80 
irnd (rounding), 81 
mask, 80 
scale, 81 
GSR (general status) register 
ASR summary, 71 
state after reset, 569 
guest watchdog exception, 501 


H 
H superscript on instruction name, 138 
halfword, 10 

alignment, 28, 116, 403 

data format, 35 
hardware 

dependency, 586 

traps, 473 
hardware trap stack, 33 
HINTP register, 107 
HPR state registers (ASRs), 104-111 
hpriv field of HPSTATE register, 458 
HPSTATE register 

fields 

hpriv 
and access to PCR, 78 

HPSTATE register, 105 


entering hyperprivileged execution mode, 453 
hpriv field, 106 
hpriv field, See also hyperprivileged (hpriv) field 
of HPSTATE register 
and HTSTATE register, 106 
ibe field, 502 
ibe field, 105 
red field, 105, 458 
state after reset, 567 
tlz field, 106 
tlz field, and trap level zero exception, 106, 505 
HPSTATE register fields 
hpriv 
determining mode, 12 
hsp (hstick match pending) field of HINTP 
register, 108, 110 
HSTICK CMPR register, 110, 501 
and HINTP, 108 
hstick_match exception, 107, 110, 501 
hstick maich pending (hsp) field of HINTP 
register, 108, 110 
HTBA (hyperprivileged trap base address) 
register, 108, 455, 501 
establishing table address, 453 
initialization, 470 
state after reset, 567 
htrap instruction exception, 368, 501 
HTSTATE (hyperprivileged trap state) register, 106 
number of copies for reading, 306 
number of copies for writing, 380 
reading, 306 
writing to, 380 
HVER (version) register 
fields 
maxtl, 109 
maxwin, 109 
HVER (version) register, 109 
state after reset, 568 
HVER (version) register fields 
impl, 63, 109 
manuf, 109 
mask, 109 
maxgl, 109 
maxwin, 109 
hyperprivileged, 10 
mode, 91 
registers, 104 
hyperprivileged (hpriv) field of HPSTATE 
register, 305, 355, 378 
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access to register-window PR state registers, 91 
and trap control, 467 
compare and swap instructions, 166, 362 
disrupting trap condition detected, 464 
load instructions, 245, 249, 255, 263, 268, 337 
privileged action exception, 405 
store instructions, 332, 342, 355 
trap level zero exception, 169, 314, 381, 383, 505 
hyperprivileged mode 
byte order, 468 
hyperprivileged scratchpad registers 
state after reset, 569 
hypervisor (software), 10 


l 
i (integer) instruction field 
arithmetic instructions, 285, 287, 290, 321, 329, 
372, 374 
floating point load instructions, 251, 254, 258, 
273 
flush memory instruction, 188 
flush register instruction, 192 
jump-and-link instruction, 241 
load instructions, 242, 262, 263, 265, 267 
logical operation instructions, 151, 290, 385 
move instructions, 281, 283 
POPC, 293 
PREFETCH, 295 
RETURN, 315 
1/0 
access, 401 
memory, 400 
memory-mapped, 401 
IEEE 754, 10 
IEEE Std 754-1985, 11, 21, 62, 65, 68, 70, 134, 387, 
587 
IEEE 754 exception floating-point trap type, 11,64, 
65, 67, 70, 387, 388, 468, 500, 501 
IEEE-754 exception, 11 
IER register (SPARC V8), 378 
illegal instruction 
and OTHERW instruction, 324 
illegal instruction exception, 192, 501 
and SIR instruction, 466 
attempt to write in nonprivileged mode, 85 
DONE/RETRY, 169, 314, 315 
HTSTATE register, reading/writing, 105, 107 
ILLTRAP, 237 


instruction not specifically defined in 
architecture, 135 
not implemented in hardware, 147 
POPC, 294 
PREFETCH, 302 
RETURN, 316 
with BPr instruction, 163 
with branch instructions, 160, 163 
with CASA and CASXA instructions, 166, 290 
with CASXA instruction, 167 
with DONE instruction, 169 
with FMOV instructions, 193 
with FMOVcc instructions, 199 
with load instructions, 55, 249, 252, 266, 268, 
274, 444 
with move instructions, 282, 284 
with RDHPR instructions, 306 
with read hyperprivileged register 
instructions, 306, 307 
with read instructions, 304, 305, 306, 307, 383, 
590 
with store instructions, 340, 346, 352, 353, 355, 
358 
with STOFA instruction, 343 
with Tcc instructions, 368 
with TPC register, 91 
with TSTATE register, 93 
with write instructions, 378, 380, 381, 384 
write to ASR 5, 76 
write to STICK register, 84 
write to TICK register, 75 
ILLTRAP instruction, 237, 501 
imm asi instruction field 
explicit ASI, providing, 122 
floating point load instructions, 254 
load instructions, 263, 265, 267 
PREFETCH, 295 
immediate CTI, 113 
I-MMU 
and instruction prefetching, 402 
IMPDEP1 instruction, 239 
IMPDEP1 instructions, 238, 583, 584 
IMPDEP2A instructions, 238, 502, 592 
IMPDEP2B instructions, 134, 238, 502 
implementation, 11 
implementation (impl) field of HVER register, 63, 109 
implementation dependency, 585 
implementation dependent, 11 
implementation note, 5 
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implementation-dependent functional choice, 587 
implementation-dependent instructions, See 
IMPDEP2A instructions 
implicit ASI, 11, 122, 422 
implicit ASI memory access 
LDFSR, 258 
LDSTUB, 262 
load fp instructions, 251, 273 
load integer doubleword instructions, 265 
load integer instructions, 242 
STD, 352 
STFSR, 345 
store floating-point instructions, 339, 357 
store integer instructions, 331 
SWAP, 360 
implicit byte order, 96 
in registers, 49, 52, 317 
inccc synthetic instructions, 618 
inexact accrued (nxa) bit of aexc field of FSR 
register, 389 
inexact current (nxc) bit of cexc field of FSR 
register, 389 
inexact mask (nxm) field of FSR.tem, 69 
inexact quotient, 321, 372 
infinity, 389, 390 
initiated, 11 
input/output (I/O) locations 
access by nonprivileged code, 588 
behavior, 400 
contents and addresses, 588 
identifying, 595 
order, 400 
semantics, 595 
value semantics, 400 
instruction fields, 11 
See also individual instruction fields 
definition, 11 
instruction group, 11 
instruction MMU, See I-MMU 
instruction prefetch buffer, invalidation, 189 
instruction set architecture (ISA), 11, 11, 23 


Instruction Synchronous Fault Status register, See I- 


SFSR 
instruction access error exception, 502 
instruction access exception exception, 502 
register update policy, 526 
instruction breakpoint exception, 502 
instruction invalid TSB entry exception, 502 
instruction real translation miss exception, 502 


instructions 


32-bit wide, 22 
alignment, 116 
alignment, 28, 149, 403 
arithmetic, integer 
addition, 148, 363 
division, 30, 287, 321, 372 
multiplication, 30, 285, 287, 329, 374 
subtraction, 359, 369 
tagged, 30 
array addressing, 152 
atomic 
CASA/CASXA, 165 
load twin extended word from alternate 
space, 270 
load-store, 115, 165, 262, 263, 360, 361 
load-store unsigned byte, 262, 263 
successful loads, 242, 244, 266, 268 
successful stores, 331, 332 
branch 
branch if contents of integer register match 
condition, 162 
branch on floating-point condition codes, 176, 
178 
branch on integer condition codes, 156, 159 
cache, 409 
causing illegal instruction, 237 
compare and swap, 165 
comparison, 124, 359 
conditional move, 32 
control-transfer (CTIs), 30, 168, 313 
conversion 
convert between floating-point formats, 233 
convert floating-point to integer, 231 
convert integer to floating-point, 187, 236 
floating-point to integer, 389 
count of number of bits, 293 
edge handling, 170 
fetches, 116 
floating point 
compare, 61, 62, 183 
floating-point add, 174 
floating-point divide, 185 
floating-point load, 115, 251 
floating-point load from alternate space, 254 
floating-point load state register, 251, 273 
floating-point move, 193, 195, 200 
floating-point operate (FPop), 32, 258 
floating-point square root, 230 
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floating-point store, 115, 339 
floating-point store to alternate space, 341 
floating-point subtract, 235 
operate (FPop), 63, 67 
short floating-point load, 260 
short floating-point store, 350 
status of floating-point load, 258 
flush instruction memory, 188 
flush register windows, 192 
formats, 114 
generate software-initiated reset, 326 
implementation-dependent, See IMPDEP2A 
instructions 
jump and link, 30, 241 
loads 
block load, 247 
floating point, See instructions: floating point 
integer, 115 
integer from alternate space, 534 
simultaneously addressing doublewords, 360 
unsigned byte, 165, 262 
unsigned byte to alternate space, 263 
logical operations 
64-bit/32-bit, 227, 229 
AND, 151 
logical 1-operand ops on F registers, 226 
logical 2-operand ops on F registers, 227 
logical 3-operand ops on F registers, 229 
logical XOR, 385 
OR, 290 
memory, 418 
moves 
floating point, See instructions: floating point 
move integer register, 279, 283 
on condition, 22 
ordering MEMBAR, 124 
permuting bytes specified by GSR.mask, 158 
pixel component distance, 292, 292 
pixel formatting (PACK), 212 
prefetch data, 295 
read hyperprivileged register, 306 
read privileged register, 307 
read state register, 31, 303 
register window management, 32 
reordering, 408 
reserved, 134 
reserved fields, 147 
RETRY 
and restartable deferred traps, 463 


RETURN vs. RESTORE, 315 
sequencing MEMBAR, 124 

set high bits of low word, 323 
set interval arithmetic mode, 325 
setting GSR.mask field, 158 

shift, 30 

shift, 327 

shift count, 327 


shut down to enter power-down mode, 324 


SIMD, 17 


simultaneous addressing of doublewords, 361 


SIR, 326 
software-initiated reset, 326 
stores 

block store, 335 


floating point, See instructions: floating point 


integer, 115, 331 
integer (except doubleword), 331 
integer into alternate space, 332, 534 
partial, 347 
unsigned byte, 165 
unsigned byte to alternate space, 263 
unsigned bytes, 262 
swap R register, 360, 361 
synthetic (for assembly language 
programmers), 616-618 
tagged addition, 363 
test-and-set, 415 
timing, 147 
trap on integer condition codes, 366 
write hyperprivileged register, 380 
write privileged register, 382 
write state register, 377 
integer unit (IU) 
condition codes, 74 
definition, 11 
description, 26 
internal processor error exception, 502 
interrupt 
enable (ie) field of PSTATE register, 464, 467 
level, 102 
request, 11, 33, 453 
interrupt level 14 exception, 82, 503 
and SOFTINT.int level, 82 
and STICK CMPR.stick cmpr, 86 
and TICK. CMPR.tick. cmpr, 84 
interrupt level 15 exception, 503 
and SOFTINT.int level, 82 
interrupt level n exception, 464, 503 
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and SOFTINT register, 81 
and SOFTINT.int level, 82 
inter-strand operation, 11 
intra-strand operation, 11 
invalid accrued (nva) bit of aexc field of FSR 
register, 69 
invalid ASI 
and data access exception, 499 
invalid current (nvc) bit of cexc field of FSR 
register, 69, 389, 390 


invalid mask (nvm) field of FSR.tem, 69, 389, 390 


invalid exception exception, 231 


invalid fp. register floating-point trap type, 173, 


174, 184, 185, 187, 193, 199, 230 
INVALW instruction, 240 
iprefetch synthetic instruction, 616 
ISA, 11 
ISA, See instruction set architecture 
I-SFSR register, See SFSR register 
issue unit, 407, 407 
issued, 11 
italic font, in assembly language syntax, 609 
IU, 11 
ixc synthetic instructions, 618 
IXX»data, access exception (invalid ASI) 
with load alternate instructions, 268 


J 
jmp synthetic instruction, 616 
JMPL instruction, 241 
computing target address, 30 
does not change CWP, 53 
mem address not aligned exception, 503 
reexecuting trapped instruction, 315 
jump and link, See JMPL instruction 


L 

LD instruction (SPARC V8), 242 
LDBLOCKF instruction, 247, 443 

LDD instruction (SPARC V8 and V9), 266 
LDDA instruction, 442 

LDDA instruction (SPARC V8 and V9), 268 
LDDF instruction, 116, 251, 503 


LDDF. mem address not aligned exception, 503 





address not doubleword aligned, 593 
address not quadword aligned, 594 
LDDF/LDDFA instruction, 116 


load instruction with partial store ASI and 
misaligned address, 256 

with load instructions, 252, 255, 444 

with store instructions, 342, 444 
LDDF. mem not aligned exception, 60 
LDDFA instruction, 254, 349 

alignment, 116 

ASIs for fp load operations, 444 


behavior with partial store ASIs, 252—??, 256, 


256-??, 273—??, 444—?? 
causing LDDF mem address not aligned 
exception, 116, 503 
for block load operations, 443 
reading from a CMP register, 536 
used with ASIs, 443 
LDF instruction, 60, 251 
LDFA instruction, 60, 254 
LDFSR instruction, 61, 63, 64, 258, 502 
LDQF instruction, 251, 506 





LDQF mem address not aligned exception, 506 





address not quadword aligned, 594 
LDQF/LDQFA instruction, 117 
with load instructions, 255 

LDQFA instruction, 254 

LDSB instruction, 242 

LDSBA instruction, 244 

LDSH instruction, 242 

LDSHA instruction, 244 

LDSHORTT instruction, 260 

LDSTUB instruction, 115, 262, 263, 415, 416 


and data access exception (noncacheable page) 


exception, 499 


hardware primitives for mutual exclusion of 


LDSTUB, 414 
LDSTUBA instruction, 262, 263 
alternate space addressing, 29 
and data access exception exception, 499 


hardware primitives for mutual exclusion of 


LDSTUBA, 414 
LDSW instruction, 242 
LDSWA instruction, 244 
LDTW instruction, 55, 116 
LDTW instruction (deprecated), 265 
LDTWA instruction, 55, 116 
LDTWA instruction (deprecated), 267 
LDTX instruction, 439 
LDTXA instruction, 118, 120, 270, 440 
access alignment, 116 
access size, 116 
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and data access exception (noncacheable page) 
exception, 499 
LDUB instruction, 242 
LDUBA instruction, 244 
LDUH instruction, 242 
LDUHA instruction, 244 
LDUW instruction, 242 
LDUWA instruction, 244 
LDX instruction, 242 
LDXA instruction, 244, 269, 412, 534 
reading from a CMP register, 536 
LDXFSR instruction, 61, 63, 64, 258, 273, 319, 502 
leaf procedure 
modifying windowed registers, 132 
little-endian byte order, 12, 28, 96 
load 
block, See block load instructions 
floating-point from alternate space 
instructions, 254 
floating-point instructions, 251, 258 
floating-point state register instructions, 251, 273 
from alternate space, 29, 74, 122, 534 
instructions, 12 
instructions accessing memory, 115 
nonfaulting, 407 
short floating-point, See short floating-point load 
instructions 
LoadLoad MEMBAR relationship, 276 
LoadLoad MEMBAR relationship, 417 
LoadLoad predefined constant, 614 
loads 
nonfaulting, 419 
load-store alignment, 28, 116, 403 
load-store instructions 
compare and swap, 165 
definition, 12 
and fast data access protection exception, 500 
load-store unsigned byte, 165, 262, 360, 361 
load-store unsigned byte to alternate space, 263 
memory access, 27 
swap R register with alternate space 
memory, 361 
swap R register with memory, 165, 360 
LoadStore MEMBAR relationship, 276, 417 
LoadStore predefined constant, 614 
local registers, 49, 52, 309 
logical XOR instructions, 385 
Lookaside predefined constant, 614 
LSTPARTIALF instruction, 444 








M 
machine state 
after reset, 566, 570 
in RED. state, 566, 570 
manufacturer (manuf) field of HVER register, 109 
manufacturer (manuf) field of VER register, 592 
mask number (mask) field of HVER register, 109 
MAXGL, 26, 49, 100, 102, 103 
maximum global levels maxgl field of HVER 
register, 109 
maximum trap levels maxtl field of HVER 
register, 109 
MAXPGL, 100, 102 
MAXTL 
and error, state, 488 
and MAXGL, 103 
and RED state, 488 
instances of HTSTATE register, 106 
instances of TNPC register, 92 
instances of TPC register, 91 
instances of TSTATE register, 93 
instances of TT register, 94 
non-reset trap, 457 
may (keyword), 12 
mem adaress not aligned exception, 503 
JMPL instruction, 241 
LDTXA, 440, 441, 443 
load instruction with partial store ASI and 
misaligned address, 256 
register update policy, 526 
RETURN, 316 
when recognized, 167 
with CASA instruction, 166 
with compare instructions, 167 
with load instructions, 116-117, 242, 243, 245, 
251, 258, 266, 268, 269, 273, 357, 443, 444 
with store instructions, 116-117, 331, 332, 334, 
343, 346, 353, 355, 443, 444 
with swap instructions (deprecated), 360, 362 
MEMBAR 
#Sync 
semantics, 278 
instruction 
atomic operation ordering, 416 
FLUSH instruction, 188, 418 
functions, 275, 415-417 
memory ordering, 277 
memory synchronization, 124 
side-effect accesses, 402 
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STBAR instruction, 277 

write to error steering register, 555 
mask encodings 
LoadLoad, 276, 417 
LoadStore, 276,417 
Lookaside, 276, 418 
MemIssue, 276, 418 
StoreLoad, 276,417 
StoreStore, 276, 417 
Sync, 276, 418 
predefined constants 
LoadLoad, 614 
LoadStore, 614 
Lookaside, 614 
MemIssue, 614 
StoreLoad, 614 
StoreStore, 614 
Sync, 614 
MEMBAR 

fLookaside, 412 

#StoreLoad, 412 
membar. mask, 614 
MemIssue predefined constant, 614 
memory 

access instructions, 27,115 

alignment, 403 

atomic operations, 414 

atomicity, 595 

cached, 400 

coherence, 402, 595 

coherency unit, 403 

data, 418 

instruction, 418 

location, 400 

models, 399 

ordering unit, 403 

real, 400 

reference instructions, data flow order 

constraints, 408 

synchronization, 277 

virtual address, 400 

virtual address 0, 420 
memory management architecture 

(hyperprivileged), 519 

address translation, 520 

allocation of partition IDs, 520 

separation of real and virtual addresses, 519 
Memory Management Unit 

definition, 12 








Memory Management Unit, See MMU 
memory model 
mode control, 411 
partial store order (PSO), 410 
relaxed memory order (RMO), 277, 410 
sequential consistency, 411 
strong, 411 
total store order (TSO), 277, 410, 412 
weak, 411 
memory model (mm) field of PSTATE register, 96 
memory order 
pending transactions, 410 
program order, 407 
memory. model (mm) field of PSTATE register, 411 
memory-mapped I/O, 401 
metrics 
for architectural performance, 451 
for implementation performance, 451 
See also performance monitoring hardware 
MMU 
accessing registers, 526 
bypass, 421 
definition, 12 
dTLB Tag Access Register illustrated, 527 
iTLB Tag Access Register illustrated, 527 
page sizes, 517 
mode 
hyperprivileged, 91, 406 
MMU bypass, 421 
nonprivileged, 24 
privileged, 26, 91, 406 
motion estimation, 292 
MOVA instruction, 279 
MOVCC instruction, 279 
MOVcc instructions, 279 
conditionally moving integer register 
contents, 74 
conditions for copying integer register 
contents, 129 
copying a register, 61 
encoding of cond field, 581 
encoding of opf cc instruction field, 582 
used to avoid branches, 199, 281 
MOVCS instruction, 279 
move floating-point register if condition is true, 195 
move floating-point register if contents of integer 
register satisfy condition, 200 
MOVE instruction, 279 
move integer register if condition is satisfied 
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instructions, 279 

move integer register if contents of integer register 
satisfies condition instructions, 283 

move on condition instructions, 22 

MOVFA instruction, 280 

MOVEE instruction, 280 

MOVEG instruction, 280 

MOVEGE instruction, 280 

MOVEL instruction, 280 

MOVELE instruction, 280 

MOVELG instruction, 280 

MOVEN instruction, 280 

MOVENE instruction, 280 

MOVFO instruction, 280 

MOVFU instruction, 280 

MOVFUE instruction, 280 

MOVFUG instruction, 280 

MOVFUGE instruction, 280 

MOVEUL instruction, 280 

MOVEULE instruction, 280 

MOVG instruction, 279 

MOVGE instruction, 279 

MOVGU instruction, 279 

MOVL instruction, 279 

MOVLE instruction, 279 

MOVLEU instruction, 279 

MOVN instruction, 279 

movn synthetic instructions, 618 

MOVNE instruction, 279 

MOVNEG instruction, 279 

MOVPOS instruction, 279 

MOVr instructions, 130, 283, 581 

MOVRGEZ instruction, 283 

MOVRGZ instruction, 283 

MOVRLEZ instruction, 283 

MOVRLZ instruction, 283 

MOVRNZ instruction, 283 

MOVRZ instruction, 283 

MOVVC instruction, 279 

MOVVS instruction, 279 

multiple unsigned condition codes, emulating, 130 

multiply instructions, 287, 329, 374 

multiprocessor synchronization instructions, 165, 
360, 361 

multiprocessor system, 12, 188, 300, 360, 361, 409, 
595 

MULX instruction, 287 

must (keyword), 12 








N 
N superscript on instruction name, 138 
N REG WINDOWS, 13 
integer unit registers, 26, 587 
RESTORE instruction, 309 
SAVE instruction, 317 
value of, 49, 86 
NaN (not-a-number) 
conversion to integer, 389 
converting floating-point to integer, 231 
signalling, 62, 183, 233 
neg synthetic instructions, 618 
negative infinity, 389, 390 
nested traps, 22 
next program counter register, See NPC register 
NFO, 12 
noncacheable 
accesses, 400 
nonfaulting load, 12, 407 
nonfaulting loads 
behavior, 419 
use by optimizer, 420 
nonleaf routine, 241 
nonprivileged, 12 
mode, 7, 13, 24, 26, 64 
software, 76 
nonprivileged trap (npt) field of TICK register, 76, 
305 
nonresumable_error exception, 503 
nonstandard floating-point, See floating-point status 
register (FSR) NS field 
nontranslating ASI, 13, 269, 355, 422 
nonvirtual memory, 301 
NOP instruction, 156, 176, 179, 288, 296, 367 
normal trap, 13 
normal traps, 457, 472, 487, 490 
NORMALW instruction, 289 
not synthetic instructions, 617 
note 
architectural direction, 5 
compatibility, 5 
general, 4 
implementation, 5 
programming, 4 
NPC (next program counter) register, 76 
control flow alteration, 17 
definition, 12 
DONE instruction, 168 
instruction execution, 113 
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relation to TNPC register, 92 
RETURN instruction, 313 
saving after trap, 33 
state after reset, 567 
npt, 13 
nucleus context, 190 
nucleus software, 13 
NUMA, 13, 533 
nvm (invalid mask) field of FSR.tem, 69, 389, 390 
NWIN, See N REG. WINDOWS 
nxm (inexact mask) field of FSR.tem, 69 


O 
octlet, 13 
odd parity, 13 
ofm (overflow mask) field of FSR.tem, 69 
op3 instruction field 
arithmetic instructions, 148, 160, 163, 165, 285, 
287, 321, 329, 372, 374 
floating point load instructions, 251, 254, 258, 
273 
flush instructions, 188, 192 
jump-and-link instruction, 241 
load instructions, 242, 262, 263, 265, 267 
logical operation instructions, 151, 290, 385 
PREFETCH, 295 
RETURN, 315 
opcode 
definition, 13 
format, 239 
opf instruction field 
floating point arithmetic instructions, 174, 185, 
209, 230 
floating point compare instructions, 183 
floating point conversion instructions, 231, 233, 
236 
floating point instructions, 173 
floating point integer conversion, 187 
floating point move instructions, 193 
floating point negate instructions, 211 
opf_cc instruction field 
floating point move instructions, 195 
move instructions, 582 
opf_low instruction field, 195 
optional, 13 
OR instruction, 290 
ORcc instruction, 290 
ordering MEMBAR instructions, 124 


ordering unit, memory, 403 
ORN instruction, 290 
ORNcc instruction, 290 
OTHERW instruction, 291 
OTHERWIN (other windows) register, 88 
FLUSHW instruction, 192 
keeping consistent state, 90 
modified by OTHERW instruction, 291 
partitioned, 90 
range of values, 86, 596 
rd designation for WRPR instruction, 382 
rs1 designation for RDPR instruction, 307 
SAVE instruction, 318 
state after reset, 568 
zeroed by INVALW instruction, 240 
zeroed by NORMALW instruction, 289 
OTHERWIN register trap vectors 
fill/spill traps, 507 
handling spill/fill traps, 507 
selecting spill/fill vectors, 508 
out register #7, 55 
out registers, 49, 52, 317 
overflow 
bits 
(V) in condition fields of CCR, 125 
accrued (ofa) in aexc field of FSR register, 69 
current (ofc) in cexc field of FSR register, 69 
causing spill trap, 507 
tagged add /subtract instructions, 125 
overflow mask (ofm) field of FSRtem, 69 


P 

p (predict) instruction field of branch 
instructions, 159, 162, 163, 179 

P superscript on instruction name, 138 

PA watchpoint exception, 422, 503 

packed-to-planar conversion, 221 

packing instructions, See FPACK instructions 

page fault, 301 

page table entry (PTE), See translation table entry 
(TTE) 

parity, even, 9 

parity, odd, 13 

park, 13 

parked, 13 

parking CMP core, 544 

partial store instructions, 347, 444 

partial store order (PSO) memory model, 410, 411 
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Partition ID register 
memory address representation, 519-520 
and TLB entries, 520 
partition identifier, 404, 519 
partitioned 
additions, 218 
subtracts, 223 
Pag; Superscript on instruction name, 138 
Pasg superscript on instruction name, 138 
PC (program counter) register, 15, 71, 76 
after instruction execution, 113 
CALL instruction, 164 
changed by NOP instruction, 288 
copied by JMPL instruction, 241 
saving after trap, 33 
set by DONE instruction, 168 
set by RETRY instruction, 313 
state after reset, 567 
Trap Program Counter register, 91 
PCR 
ASR summary, 71 
PCR register fields 
priv, 79 
sl (select lower bits of PIC), 78 
St (system trace enable), 79 
su (select upper bits of PIC), 78 
ut (user trace enable), 79 
PDIST instruction, 292 
pef field of PSTATE register 
and access to GSR, 80 
and fp disabled exception, 500 
and FPop instructions, 134 
branch operations, 177, 179 
byte permutation, 158 
comparison operations, 181, 184 
data movement operations, 282 
enabling FPU, 77 
floating-point operations, 173, 174, 185, 187, 193, 
198, 201, 209, 211, 230, 231, 233, 235, 236, 251, 
254, 258, 260, 273 
integer arithmetic operations, 220, 225 
logical operations, 226, 227, 229 
memory operations, 249 
read operations, 305, 325, 337 
special addressing operations, 149, 175, 339, 345, 
349, 351, 357, 378 
trap control, 467 
pef, See PSTATE, pef field 


Performance Control register, See PCR 


performance instrumentation counter register, See 


PIC register 
performance monitoring hardware 
accuracy requirements, 451 
classes of data reported, 451 
counters and controls, 452 
high-level requirements, 449 
kinds of user needs, 449 
See also instruction sampling 
physical address, 14 
physical core, 14 
physical processor, 14 
PIC (performance instrumentation counter) 
register, 14, 79 
accessing, 504 
ASR summary, 71 
and PCR, 78 
picl field, 79 
picu field, 79 
pic overflow exception, 503 
PIL (processor interrupt level) register, 102 
interrupt conditioning, 464 
interrupt request level, 468 
interrupt level n, 503 
specification of register to read, 307 
specification of register to write, 382 
state after reset, 567 
trap processing control, 467 
pipeline, 14 
pipeline draining of CPU, 87,91 
PIPT, 14 
pixel instructions 
compare, 180 
component distance, 292, 292 
formatting, 212 
pixel registers for storing values, 238 
planar-to-packed conversion, 221 
Papt Superscript on instruction name, 138 
POPC instruction, 293 
POR, 14 
POR (power on reset), 564 
machine state changes, 566 
POR, See power on reset (POR) 
positive infinity, 389, 390 
power failure, 466, 494 
power. on reset (POR) 
hard reset when POR pin activated, 564 
power on reset (POR), 503, 564 
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effect on HTSTATE, 107 
effect on STICK register fields, 85 
effect on TNPC register, 92 
effect on TPC, 92 
effect on TT register, 94 
enabling/disabling virtual processors, 541, 542 
full-processor reset, 550 
hard reset, 543, 601 
machine state changes, 566 
and RED state, 457, 459, 490 
STRAND ENABLE STATUS register, 553 
system reset, 550 
when initiated, 466 

Ppic superscript on instruction name, 138 





precise floating-point traps, 308 
precise trap, 461 

conditions for, 461 

software actions, 461 

vs. disrupting trap, 463 
predefined constants 

LoadLoad, 614 

lookaside, 614 

MemIssue, 614 

StoreLoad, 614 

StoreStore, 614 

Sync, 614 
predict bit, 163 
prefetch 

for one read, 300 

for one write, 300 

for several reads, 299 

for several writes, 300 

page, 301 
prefetch data instruction, 295 
PREFETCH instruction, 115, 295, 591 
prefetch. fcn, 614 
PREFETCHA instruction, 295, 591 

and invalid ASI or VA, 499 
prefetchable, 14 
priority of traps, 468, 485 
priveleged action exception 

read from TICK register when access disabled, 75 
privilege violation 

and data access exception, 498, 502 
privileged, 14 

mode, 26, 91, 406 

registers, 91 

software, 25, 53, 64, 97, 123, 192, 470, 591 
privileged (priv) field of PCR register, 305 


privileged (priv) field of PSTATE register, 99, 106, 
166, 169, 245, 249, 254, 255, 263, 268, 305, 332, 
337, 342, 355, 361, 362, 378, 406, 503, 504 

privileged mode, 14 

privileged action exception, 503 
accessing restricted ASIs, 405 
PIC access, 79 
read from TICK register when access disabled, 75 
register update policy, 526 
restricted ASI access attempt, 123, 422 
TICK register access attempt, 74 
with CASA instruction, 166 
with compare instructions, 167 
with load alternate instructions, 245, 249, 255, 

263, 268, 332, 337, 342, 355, 362, 378 
with load instructions, 254 
with RDasr instructions, 305 
with read instructions, 305 
with store instructions, 344 
with swap instructions, 362 
privileged opcode exception, 504 
DONE instruction, 169 
RETRY instruction, 314 
SAVED instruction, 319 
with DONE instruction, 169, 307, 314, 383 
with write instructions, 384 
WRPR in nonprivileged mode, 75 
processor, 15 
execute unit, 407 
issue unit, 407, 407 
privilege-mode transition diagram, 456 
reorder unit, 407 
self-consistency, 408 
state diagram, 457 
processor cluster, See processor module 
processor consistency, 409, 413 
processor interrupt level register, See PIL register 
processor self-consistency, 408, 412 
processor state register, See PSTATE register 
processor states 
error state, 457, 460, 486, 487, 488 
entering, 494, 495, 496 

execute_state, 486, 488 

RED. state, 457, 458, 459, 473, 486, 488, 490, 
492, 496 

processor states, See error state, 
execute state,and RED state 

program counter register, See PC register 

program counters, saving, 453 
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program order, 407, 408 
programming note, 4 
PSO, See partial store order (P5O) memory model 
PSR register (SPARC V8), 378 
PSTATE register 
fields 
priv 
and access to PCR, 78 
PSTATE register 
entering privileged execution mode, 453 
restored by RETRY instruction, 168, 313 
saved after trap, 453 
saving after trap, 33 
specification for RDPR instruction, 307 
specification for WRPR instruction, 382 
state after reset, 567 
and TSTATE register, 93 
PSTATE register fields 
ag 
unimplemented, 100 
am 
CALL instruction, 164 
description, 97 
masked /unmasked address, 168, 241, 313, 
315 
cle 
and implicit ASIs, 122 
and PSTATE.tle, 96 
description, 96 


description, 99, 100 
enabling disrupting traps, 464 
interrupt conditioning, 464 
masking disrupting trap, 474 
mm 
description, 96 
implementation dependencies, 96, 97, 410, 
595 
reserved values, 96 
pef 
and FPRS.fef, 97 
description, 97 
See also pef field of PSTATE register 
priv 
access to register-window PR state 
registers, 91 
accessing restricted ASIs, 405 
description, 99 
determining mode, 12, 14, 523 


when processor in privileged mode, 106 
tle 
and PSTATE.cle, 96 
description, 96 
PTE (page table entry), See translation table entry 
(TTE) 


Q 
quadword, 15 
alignment, 28, 116, 403 
data format, 35 
quiet NaN (not-a-number), 62, 183 


R 
R register, 15 
#15, 55 
special-purpose, 55 
alignment, 266, 268 
RA watchpoint exception, 477, 483 
rational quotient, 372 
R-A-W, See read-after-write memory hazard 
rcond instruction field 
branch instructions, 162 
encoding of, 581 
move instructions, 283 
rd (rounding), 15 
rd instruction field 
arithmetic instructions, 148, 160, 163, 165, 285, 
287, 321, 329, 372, 374 
floating point arithmetic, 174 
floating point arithmetic instructions, 185, 209, 
230 
floating point conversion instructions, 231, 233, 
236 
floating point integer conversion, 187 
floating point load instructions, 251, 254, 258, 
273 
floating point move instructions, 193, 195 
floating point negate instructions, 211 
floating-point instructions, 173 
jump-and-link instruction, 241 
load instructions, 242, 262, 263, 265, 267 
logical operation instructions, 151, 290, 385 
move instructions, 281, 283 
POPC, 293 
RDASI instruction, 70, 74, 303 
RDasr instruction, 303 
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accessing I/O registers, 29 
implementation dependencies, 304, 590 
reading ASRs, 70 
RDCCR instruction, 70, 72, 303, 303 
RDFPRS instruction, 71, 77, 303 
RDGSR instruction, 71, 80, 303 
RDHPR instruction, 104, 105, 107, 109, 306 
hyperprivileged registers read, 306 
RDPC instruction, 71, 303 
reading PC register, 76 
RDPCR instruction, 71, 303 
RDPIC instruction, 71, 303, 504 
RDPR instruction, 71, 307 
accessing GL register, 103 
accessing non-register-window PR state 
registers, 91 
accessing register-window PR state registers, 86 
and register-window PR state registers, 86 
effect on TNPC register, 92 
effect on TPC register, 92 
effect on TSTATE register, 94 
effect on TT register, 95 
reading privileged registers, 91 
reading PSTATE register, 95 
reading the TICK register, 75 
registers read, 307 
RDSOFTINT instruction, 71, 81, 303 
RDSTICK instruction, 71, 84, 85, 303 
RDSTICK_CMPR instruction, 71, 303 
RDTICK instruction, 71, 75, 303 
RDTICK_CMPR instruction, 71, 303 
RDY instruction, 72 
read ancillary state register (RDasr) 
instructions, 303 
read state register instructions, 31 
read-after-write memory hazard, 408 
real address, 15 
real ASI, 422 
real memory, 400 
Real Range registers 
fields, 527 
real-translating ASIs, 422 
RED. state, 15 
catastrophic failure avoidance, 486 
description, 456 
entering, 459, 492, 595 
entry conditions, 457 
exiting, 106 
red field of HPSTATE register, 457, 458, 459, 486, 





487 
restricted environment, 458 
special trap processing, 490 
trap processing, 459, 486, 488 
trap table, 473 
trap vector, 471, 595 
RED. state trap, 15 
RED stale exception exception, 504 
reference MMU, 609 
reg, 610 
reg or imm, 615 
reg plus imm, 614 
regaddr, 614 
register reference instructions, data flow order 
constraints, 408 
register window, 49 
register window management instructions, 32 
register windows 
clean, 88, 90, 131, 498, 506, 507, 508 
fill, 53, 90, 131, 132, 310, 311, 319, 500, 507, 508, 
509 
management of, 24 
overlapping, 52-54 
spill, 53, 90, 131, 132, 133, 318, 319, 504, 507, 508, 
509 
registers 
See also individual register (common) names 
accessing MMU registers, 526 
address space identifier (ASI), 406 
ASI (address space identifier), 74 
chip-level multithreading, See CMT 
clean windows (CLEANWIN), 88 
clock-tick (TICK), 504 
current window pointer (CWP), 87 
F (floating point), 387, 468 
floating-point, 26 
programming, 59 
floating-point registers state (FPRS), 76 
floating-point state (FSR), 61 
general status (GSR), 80 
GL (global level), 109 
global, 22, 26, 49, 49, 49, 587 
global level (GL), 102 
HSTICK  CMPR 
and HINTP, 108 
HSTICK CMPR, 110 
HTSTATE (hyperprivileged trap state), 106 
HVER (version register), 109 
hyperprivileged, 104 
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IER (SPARC V8), 378 
in, 49, 52, 317 
local, 49, 52 
next program counter (NPC), 76 
other windows (OTHERWIN), 88 
out, 49,52, 317 
out #7, 55 
performance control (PCR), 78 
performance instrumentation counter (PIC), 79 
pixel storage registers, 238 
processor interrupt level (PIL) 

and PIC, 80 

and PIC counter overflow, 80 

and SOFTINT, 82 

and STICK_CMPR, 86 

and TICK_CMPR, 84 
processor interrupt level (PIL), 102 
program counter (PC), 76 
PSR (SPARC V8), 378 
R register #15, 55 
renaming mechanism, 408 
restorable windows (CANRESTORE), 88, 88 
savable windows (CANSAVE), 87 
scratchpad 

hyperprivileged, 446 

privileged, 445 
SOFTINT, 71 
SOFTINT CLR pseudo-register, 71, 83 
SOFTINT SET pseudo-register, 71, 82 
STICK, 84 
STICK CMPR 

and HINTP, 108 

ASR summary, 71 

int dis field, 82, 86 

stick cmpr field, 86 

and system software trapping, 85 
TBR (SPARC V8), 378 
TICK, 74 
TICK_CMPR 

int_dis field, 82, 84 

tick cmpr field, 84 
TICK CMPR, 71, 83 
TL (trap level), 109 
trap base address (TBA), 95 
trap base address, See registers: TBA 
trap level (TL), 100 
trap level, See registers: TL 
trap next program counter (TNPC), 92 
trap next program counter, See registers: TNPC 


trap program counter (TPC), 91 
trap program counter, See registers: TPC 
trap state (TSTATE), 93 
trap state, See registers: TSTATE 
trap type (TT), 94, 472 
trap type, See registers: TT 
VA WATCHPOINT, 505 
visible to software in privileged mode, 91-104 
WIM (SPARC V8), 378 
window state (WSTATE), 89 
window state, See registers: WSTATE 
Y (32-bit multiply/divide), 72 
relaxed memory order (RMO) memory model, 277, 
410 
renaming mechanism, register, 408 
reorder unit, 407 
reordering instruction, 408 
reserved, 15 
fields in instructions, 147 
register field, 48 
reset 
externally initiated reset (XIR), 457, 459, 466, 
472, 490, 494, 494, 500, 551, 565 
power on reset (POR) 
enabling/disabling virtual processors, 541, 
542 
machine state changes, 566 
STRAND ENABLE STATUS register, 553 
power on reset (POR), 457, 459, 466, 490, 503, 
550, 564 
power-on, 75 
processing, 458 
request, 503, 504 
reset trap, 75, 94, 463, 466 
software initiated reset (SIR), 456, 457, 459, 466, 
472, 486, 495, 504, 550, 566 
trap, 589 
trap vector address, See RSTVaddr 
warm reset (WMR) 
and STRAND ENABLE register, 543 
enabling/disabling virtual processors, 541, 
542 
machine state changes, 566 
warm reset (WMR), 565 
watchdog (WDR), 550 
watchdog reset (POR), 459 
watchdog reset (WDR) 
and guest watchdog, 455 
watchdog reset (WDR), 490, 494, 505, 550, 566 


Index xxix 


XIR, 551 
reset trap, 16 
Reset, Error, and Debug state, See RED state 
restartable deferred trap, 462 
restorable windows register, See CANRESTORE 
register 
RESTORE instruction, 53, 309-310 
actions, 131 
and current window, 55 
decrementing CWP register, 52 
fill trap, 500, 507 
followed by SAVE instruction, 53 
managing register windows, 32 
operation, 309 
performance trade-off, 309, 317 
and restorable windows (CANRESTORE) 
register, 88 
restoring register window, 309 
role in register state partitioning, 90 
restore synthetic instruction, 617 
RESTORED instruction, 132, 311 
creating inconsistent window state, 311 
fill handler, 310 
fill trap handler, 132, 509 
register window management, 32 
restricted, 16 
restricted address space identifier, 123 
restricted ASI, 405, 421 
resumable error exception, 504 
ret/ret1 synthetic instructions, 616 
RETRY instruction, 313 
and restartable deferred traps, 463 
effect on HTSTATE, 107 
effect on TNPC register, 92 
effect on TPC register, 92 
effect on TSTATE register, 94 
executed in RED state, 458 
generating illegal instruction exception, 502 
modifying CCR.xcc, 73 
reexecuting trapped instruction, 508 
restoring gl value in GL, 104 
return from trap, 453 
returning to instruction after trap, 465 
target address, return from privileged traps, 31 
RETURN instruction, 315-316 
computing target address, 30 
fill trap, 500 
mem address not aligned exception, 503 
operation, 315 








reexecuting trapped instruction, 315 
RETURN vs. RESTORE instructions, 315 
RMO, 16 
RMO, See relaxed memory order (RMO) memory 
model 
rounding 
for floating-point results, 62 
in signed division, 321 
rounding direction (rd) field of FSR register, 174, 
185, 209, 230, 231, 233, 235, 236 
routine, nonleaf, 241 
rs1 instruction field 
arithmetic instructions, 148, 160, 163, 165, 285, 
287, 321, 329, 372, 374 
branch instructions, 162 
floating point arithmetic instructions, 174, 185, 
209 
floating point compare instructions, 183 
floating point load instructions, 251, 254, 258, 
273 
flush memory instruction, 188 
jump-and-link instruction, 241 
load instructions, 242, 262, 263, 265, 267 
logical operation instructions, 151, 290, 385 
move instructions, 283 
PREFETCH, 295 
RETURN, 315 
rs2 instruction field 
arithmetic instructions, 148, 160, 163, 165, 285, 
287, 290, 321, 329, 372, 374 
floating point arithmetic instructions, 174, 185, 
209, 230 
floating point compare instructions, 183 
floating point conversion instructions, 231, 233, 
236 
floating point instructions, 173 
floating point integer conversion, 187 
floating point load instructions, 251, 254, 258, 
273 
floating point move instructions, 193, 195 
floating point negate instructions, 211 
flush memory instruction, 188 
jump-and-link instruction, 241 
load instructions, 242, 265, 267 
logical operation instructions, 151, 385 
move instructions, 281, 283 
POPC, 293 
PREFETCH, 295 
RSTVADDR, 459, 471, 472, 473, 492, 493, 494, 495, 
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496, 497, 567, 595 SFSR register, 16 
RTO, 16 data access exception, 498 
RTS, 16 fault type field (ft), 498 
state after reset, 569 
update policy, 526 


S shall (keyword), 16 
savable windows register, See CANSAVE register shared memory, 399 
SAVE instruction, 52, 317 shift count encodings, 327 
actions, 131 shift instructions, 30 
after RESTORE instruction, 315 shift instructions, 124, 327 
clean window exception, 498, 507 short floating-point load and store instructions, 444 
and current window, 55 short floating-point load instructions, 260 
decrementing CWP register, 52 short floating-point store instructions, 350 
effect on privileged state, 318 should (keyword), 16 
leaf procedure, 241 SHUTDOWN instruction, 324 
and local /out registers of register window, 53 SIAM instruction, 325 
managing register windows, 32 side effect 
no clean window available, 88 accesses, 401 
number of usable windows, 88 definition, 16 
operation, 317 I/O locations, 400 
performance trade-off, 317 instruction prefetching, 402 
role in register state partitioning, 90 real memory storage, 400 
and savable windows (CANSAVE) register, 87 visible, 401 
spill trap, 504, 507, 508 signalling NaN (not-a-number), 62, 233 
save synthetic instruction, 617 signed integer data type, 35 
SAVED instruction, 132, 319 signx synthetic instructions, 617 
creating inconsistent window state, 319 SIMD, 17 
register window management, 32 instruction data formats, 43-45 
spill handler, 318, 319 simm10 instruction field 
spill trap handler, 132, 509 move instructions, 283 
scaling of the coefficient, 204 simm!1 1 instruction field 
scratchpad registers move instructions, 281 
hyperprivileged, 446 simm13 instruction field 
privileged, 445 floating point 
state after reset, 569 load instructions, 251, 273 
SDIV instruction, 72, 321 simm13 instruction field 
SDIVcc instruction, 72, 321 arithmetic instructions, 285, 287, 290, 321, 329, 
SDIVX instruction, 287 372, 374 
self-consistency, processor, 408 floating point load instructions, 254, 258 
self-modifying code, 188, 189, 416 flush memory instruction, 188 
sequencing MEMBAR instructions, 124 jump-and-link instruction, 241 
sequential consistency, 402, 410, 411 load instructions, 242, 262, 263, 265, 267 
sequential consistency memory model, 411 logical operation instructions, 151, 385 
service processor, 16 POPC, 293 
SETHI instruction, 125, 323 PREFETCH, 295 
creating 32-bit constant in R register, 29 RETURN, 315 
and NOP instruction, 288 single instruction/multiple data, See SIMD 
with rd = 0, 323 SIR, 17 
setn synthetic instructions, 617 SIR (software initiated reset), 566 
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SIR instruction, 326 
affecting virtual processor, 566 
causing software initiated reset exception, 466, 
504 
and trap priority, 485 
use by supervisor software, 495 
SIR, See software initiated reset (SIR) 
SLL instruction, 327 
SLLX instruction, 327 
SMUL instruction, 72, 329 
SMULcc instruction, 72, 329 
snooping, 17 
SOFTINT register, 71, 81 
clearing, 513 
clearing of selected bits, 83 
communication from nucleus code to kernel 
code, 512 
scheduling interrupt vectors, 511, 512 
setting, 512 
state after reset, 568 
SOFTINT register fields 
int level, 82 
sm (stick int), 82 
tm (tick int), 82, 84 
SOFTINT CLR pseudo-register, 71, 83 
SOFTINT. SET pseudo-register, 71, 82, 83 
software 
nucleus, 13 
software translation table, 518 
software trap, 367, 470, 473 
software trap number (SWTN), 367 
software, nonprivileged, 76 
software initiated reset (SIR), 495, 504, 566 
entering error state, 456 
entering RED state, 457 
and MAXTL, 459 
per-strand reset, 550 
RED. state trap processing, 490 
RED. state trap vector, 472 
SIR instruction, 326, 466 
and virtual processor, 566 
virtual processor trap processing, 486 
when TL = MAXTL, 486 
software trap number, 615 
source operands, 218, 223 
SPA 
ASI TWIN DW NUCLEUS, 447 
SPARC V8 compatibility 
LD, LDUW instructions, 242 











operations to I/O locations, 402 
read state register instructions, 304 
STA instruction renamed, 334 
STBAR instruction, 277 
STD instruction, 353 
STDA instruction, 355 
tagged subtract instructions, 371 
UNIMP instruction renamed, 237 
window overflow exception superseded, 500 
write state register instructions, 378 
SPARC V9 
compliance, 13 
features, 22 
SPARC V9 Application Binary Interface (ABI), 24 
special trap, renamed, 457 
special traps, 457, 473 
speculative load, 17 
spill register window, 504 
FLUSH instruction, 133 
overflow /underflow, 53 
RESTORE instruction, 131 
SAVE instruction, 90, 131, 317, 507 
SAVED instruction, 132, 319, 509 
selection of, 507 
trap handling, 508 
trap vectors, 318, 508 
window state, 90 
spill n normal exception, 318, 504 
and FLUSHW instruction, 192 
Spill n other exception, 318, 504 
and FLUSHW instruction, 192 
SRA instruction, 327 
SRAX instruction, 327 
SRL instruction, 327 
SRLX instruction, 327 
stack frame, 317 
state registers (ASRs), 70-86 
STB instruction, 331 
STBA instruction, 332 
STBAR instruction, 304, 377, 408, 416 
STBLOCKT instruction, 335, 443 
STDF instruction, 116, 339, 504 
STDF mem adaress not aligned exception, 504 
and store instructions, 340, 343 
STDF/STDEFA instruction, 116 
STDFA instruction, 341 
alignment, 116 
ASIs for fp store operations, 444 
causing data access exception exception, 444 





Index xxxii 


causing mem adaress not aligned or 
illegal instruction exception, 444 
causing STDF mem adaress not aligned 
exception, 116, 504 
for block load operations, 443 
for partial store operations, 444 
used with ASIs, 443 
writing to a CMP register, 536 
STF instruction, 339 
STFA instruction, 341 
STFSR instruction, 61, 63, 64, 502 
STH instruction, 331 
STHA instruction, 332 
STICK register, 71, 75, 84 
and hstick match exception, 501 
counter field, 84, 85 
fields after power-on reset trap, 85 
npt field, 75, 84 
RDSTICK instruction, 303 
state after reset, 568 
while virtual processor is parked, 544 
STICK CMPR register, 71, 85 
and HINTP, 108 
int dis field, 82, 86 
RDSTICK CMPR instruction, 303 
state after reset, 569 
stick cmpr field, 86 
store 
block, See block store instructions 
partial, See partial store instructions 
short floating-point, See short floating-point store 
instructions 
store buffer 
merging, 401 
store floating-point into alternate space 
instructions, 341 
store instructions, 17, 115, 500 
store error exception, 504 
StoreLoad MEMBAR relationship, 276, 417 
StoreLoad predefined constant, 614 
stores to alternate space, 29, 74, 122 
StoreStore MEMBAR relationship, 276, 417 
StoreStore predefined constant, 614 
STPARTIALF instruction, 347 
STOF instruction, 117, 339, 506 
STQF mem address not aligned exception, 506 
STOF/STOFA instruction, 117 
STOFA instruction, 117, 341 
strand, 17 











STRAND AVAILABLE register, 537, 541, 541, 543, 544 
state after reset, 570 
STRAND ENABLE register, 542 
state after reset, 570 
STRAND ENABLE STATUS register, 542 
state after reset, 570 
STRAND D register, 537 
state after reset, 571 
STRAND INTR ID register, 516, 538, 560 
state after reset, 571 
STRAND RUNNING register, 544, 545 
simultaneous updates, 546 
state after reset, 570 
STRAND RUNNING, RW pseudo-register, 545, 546 
STRAND RUNNING STATUS register, 544, 548 
Parked or Unparked status, 549 
state after reset, 570 
STRAND RUNNING Wl C pseudo-register, 545, 546 
STRAND RUNNING. W1S pseudo-register, 545, 546 
strong consistency memory model, 411 
strong ordering, 411 
Strong Sequential Order, 412 
strongly ordered page, illegal access to, 499 
STSHORTF instruction, 350 
STTW instruction, 55, 116 
STTW instruction (deprecated), 352 
STTWA instruction, 55, 116 
STTWA instruction (deprecated), 354 
STW instruction, 331 
STWA instruction, 332 
STX instruction, 331 
STXA instruction, 332 
accessing CMP-specific registers, 534 
accessing nontranslating ASIs, 355 
mem address not aligned exception, 332 
referencing internal ASIs, 412 
writing to a CMP register, 536 
STXFSR instruction, 61, 63, 64, 357, 502 
SUB instruction, 359, 359 
SUBC instruction, 359, 359 
SUBcc instruction, 124, 359, 359 
SUBCcc instruction, 359, 359 
subnormal number, 17 
subtract instructions, 359 
superscalar, 17 
supervisor software 
accessing special protected registers, 28 
definition, 17 
forcing processing into RED. state, 486 
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use of SIR trap, 495 deccc, 618 


suspend, 17 inc, 618 
suspended, 17 inccc, 618 
SWAP instruction, 27, 360 iprefetch, 616 
accessing doubleword simultaneously with other jmp, 616 
instructions, 361 movn, 618 
and data access exception (noncacheable page) neg, 618 
exception, 499 not, 617 
hardware primitive for mutual exclusion, 414, restore, 617 
415 ret/ret1, 616 
identification of R register to be exchanged, 115 save, 617 
in multiprocessor system, 262, 263 setn, 617 
memory accessing, 360 signx, 617 
ordering by MEMBAR, 416 tst, 616 
swap R register vs. pseudo ops, 616 
bit contents, 165 system clock-tick register (STICK), 84 
with alternate space memory instructions, 361 system software, 504 
with memory instructions, 360 accessing memory space by server program, 405 
SWAPA instruction, 361 ASIs allowing access to memory space, 406 
accessing doubleword simultaneously with other FLUSH instruction, 190, 419 
instructions, 361 processing exceptions, 405 
alternate space addressing, 29 trap types from which software must recover, 64 
and data access exception (noncacheable page) System Tick Compare register, See STICK CMPR 
exception, 499 register 
hardware primitive for mutual exclusion, 414 System Tick register, See STICK register 


in multiprocessor system, 262, 263 
ordering by MEMBAR, 416 


SWTN (software trap number), 367 T 
Sync predefined constant, 614 TA instruction, 366, 581 
synchonization, 278 TADDcc instruction, 125, 363 
synchronization, 17 TADDccTV instruction, 125, 505 
Synchronous Fault Address register (SFAR), 9 tag overflow, 125 
Synchronous Fault Address Register (SFAR),, See tag overflow exception, 125, 363, 364, 365, 369, 371 
Data Synchronous Fault Address Register (D- lag overflow exception (deprecated), 505 
SFAR) tagged arithmetic, 125 
synchronous fault status register, See SFSR register tagged arithmetic instructions, 30 
synthetic instructions tagged word data format, 35 
mapping to SPARC V9 instructions, 616-618 tagged words, 35 
for assembly language programmers, 616 TBA (trap base address) register, 95, 455 
mapping establishing table address, 32, 453 
bclrg, 618 initialization, 469 
bset, 618 specification for RDPR instruction, 307 
btog, 618 specification for WRPR instruction, 382 
btst, 618 state after reset, 567 
call, 616 trap behavior, 18 
casn, 618 TBR register (SPARC V8), 378 
clrn, 618 TCC instruction, 366 
cmp, 616 Tece instructions, 366 
dec, 618 at TL » 0, 470 
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causing trap, 453 
causing trap to privileged trap handler, 473 
CCR register bits, 73 
generating htrap instruction exception, 501 
generating illegal instruction exception, 501 
generating trap instruction exception, 505 
opcode maps, 577, 581, 582 
programming uses, 368 
trap table space, 33 
vector through trap table, 453 
TCS instruction, 366, 581 
TE instruction, 366, 581 
termination deferred trap, 461 
test-and-set instruction, 415 
TG instruction, 366, 581 
TGE instruction, 366, 581 
TGU instruction, 366, 581 
thread, 18 
TICK register, 71 
controlling access to timing information, 76 
counter field, 75, 592, 607 
fields after power-on reset trap, 75 
inaccuracies between two readings of, 592, 607 
npt field, 76 
specification for RDPR instruction, 307 
specification for WRPR instruction, 382 
state after reset, 568 
while virtual processor is parked, 544, 605 
TICK CMPR register, 71, 83 
int dis field, 82, 84 
state after reset, 568 
tick cmpr field, 84 
timer registers, See TICK register and STICK register 
timing of instructions, 147 
tininess (floating-point), 69 
TL (trap level) register, 100, 455 
affect on privilege level to which a trap is 
delivered, 469 
and implicit ASIs, 122 
displacement in trap table, 453 
executing RESTORED instruction, 311 
executing SAVED instruction, 319 
indexing for WRHPR instruction, 380 
indexing for WRPR instruction, 382 
indexing hyperprivileged register after 
RDHPR, 306 
indexing privileged register after RDPR, 307 
setting register value after WRHPR, 380 
setting register value after WRPR, 382 


specification for RDPR instruction, 307 
specification for WRPR instruction, 382 
state after reset, 568 
and TBA register, 469 
and TPC register, 91 
and TSTATE register, 93, 106 
and TT register, 94 
use in calculating privileged trap vector 
address, 469 
and VER.maxtl, 109 
and WSTATE register, 89 
TL instruction, 366, 581 
TLB, 18 
and 3-dimensional arrays, 155 
definition, 18 
hit, 18 
miss, 18 
handler, 518 
MMU behavior, 518 
reloading TLB, 518, 524 
partition IDs, 520 
TLE instruction, 366, 581 
TLEU instruction, 366, 581 
TN instruction, 366, 581 
TNE instruction, 366, 581 
TNEG instruction, 366, 581 
TNPC (trap next program counter) register, 92 
saving NPC, 461 
specification for RDPR instruction, 307 
specification for WRPR instruction, 382 
state after reset, 568 


TNPC (trap-saved next program counter) register, 18 


total order, 410 


total store order (TSO) memory model, 96, 277, 401, 


410, 411, 412 

TPC (trap program counter) register, 18, 91 
address of trapping instruction, 308 
number of instances, 91 
specification for RDPR instructions, 307 
specification for WRPR instruction, 382 
state after reset, 568 

TPOS instruction, 366, 581 

translating ASI, 422 

Translation Lookaside Buffer, See TLB 

Translation Table Entry, See TTE 

trap 
See also exceptions and traps 
noncacheable accesses, 402 
when taken, 17 
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trap enable mask (tem) field of FSR register, 467, 
468, 588 
trap handler 
for global registers, 104 
hyperprivileged mode, 472 
privileged mode, 472 
regular /nonfaulting loads, 12 
returning from, 168, 313 
user, 65, 389 
trap level register, See TL register 
trap next program counter register, See TNPC register 
trap on integer condition codes instructions, 366 
trap program counter register, See TPC register 
trap state register, See TSTATE register 
trap type (TT) register, 472 
trap type register, See TT register 
trap instruction (ISA) exception, 367, 368, 505 
trap level zero exception, 106, 505 
state after reset, 567 
with WRHPR instructions, 381 
with write instructions, 384 
trap little endian (tle) field of PSTATE register, 96 
traps, 18 
See also exceptions and individual trap names 
categories 
deferred, 461, 461, 463 
disrupting, 461, 463, 466 
precise, 461, 461, 463 
priority, 468, 485 
reset, 94, 461, 463, 466, 466, 486, 589 
restartable 
implementation dependency, 463 
restartable deferred, 462 
termination deferred, 461 
caused by undefined feature/behavior, 19 
causes, 33, 33 
definition, 32, 454 
hardware, 473 
hardware stack, 22 
level specification, 100 
model stipulations, 466 
nested, 22 
normal, 13, 457, 472, 487, 490 
processing, 486 
software, 367, 470, 473 
software initiated reset (SIR), 490 
special, 457, 473 
stack, 488 
vector address, specifying, 95, 108 


vector, RED. state, 471 
TSB, 18, 524 
cacheability, 525 
caching, 525 
indexing support, 524 
organization, 525 
TSO, 18 
TSO, See total store order (TSO) memory model 
tst synthetic instruction, 616 
TSTATE (trap state) register, 93 
DONE instruction, 168, 313 
registers saved after trap, 33 
restoring GL value, 104 
specification for RDPR instruction, 307 
specification for WRPR instruction, 382 
state after reset, 568 
tstate, See trap state (TSTATE) register 
TSUBcc instruction, 125, 369 
TSUBccTV instruction, 125, 505 
TT (trap type) register, 94 
and privileged trap vector address, 469, 470 
reserved values, 589 
specification for RDPR instruction, 307 
specification for WRPR instruction, 382 
state after reset, 567 
and Tcc instructions, 368 
transferring trap control, 472 
trap type recorded after 
RED state exception, 504 
window spill/fill exceptions, 89 
WRHEPR instruction, 380 
WRPR instruction, 382 
TTE, 18 
context ID field, 521 
cp (cacheability) field, 400 
cp field, 499, 523, 523 
cv field, 523, 523 
e field, 401, 419, 499, 523 
ie field, 522 
indexing support, 524 
nfo field, 419, 499, 521, 523 
p field, 498, 523 
size field, 524 
soft? field, 521 
SPARC V8 equivalence, 520 
taddr field, 522 
V field, 521 
va tag field, 521 
w field, 524 
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TVC instruction, 366, 581 
TVS instruction, 366, 581 
typewriter font, in assembly language syntax, 609 


U 
UDIV instruction, 72, 372 
UDIVcc instruction, 72, 372 
UDIVX instruction, 287 
ufm (underflow mask) field of FSRtem, 69 
UltraSPARC, previous ASIs 
ASI NUCLEUS. QUAD. LDD (deprecated), 447 


ASI NUCLEUS. QUAD. LDD. L (deprecated), 447 


ASI NUCLEUS QUAD LDD LITTLE 
(deprecated), 447 
ASI PHY BYPASS EC WITH EBIT L, 447 
ASI PHYS BYPASS EC WITH EBIT, 447 
ASI PHYS BYPASS EC WITH EBI LITTLE, 
447 
ASI PHYS USE. EC, 447 
ASI PHYS USE EC L, 447 
ASI PHYS USE EC LITTLE, 447 
ASI QUAD LDD. L (deprecated), 447 
ASI QUAD LDD LITTLE (deprecated), 447 
ASI QUAD LDD. PHYS (deprecated), 447 
UMUL instruction, 72 
UMUL instruction (deprecated), 374 
UMULcc instruction, 72 
UMULcc instruction (deprecated), 374 
unassigned, 18 
unconditional branches, 156, 160, 176, 179 
undefined, 18 
underflow 
bits of FSR register 
accrued (ufa) bit of aexc field, 69, 389 
current (ufc) bit of cexc, 69 
current (ufc) bit of cexc field, 389 
mask (ufm) bit of FSR.tem, 69 
mask (ufm) bit of tem field, 389 
detection, 53 
occurrence, 507 
underflow mask (ufm) field of FSR.tem, 69 
unfinished, FPop floating-point trap type, 65, 174, 
185, 210, 234, 235, 388 
handling, 70 
in normal computation, 64 
results after recovery, 65 
UNIMP instruction (SPARC V8), 237 
unimplemented, 19 




















































































































unimplemented FPop floating-point trap type, 65, 
173, 174, 184, 185, 187, 193, 199, 202, 210, 211, 
232, 234, 235, 388 
handling, 70 
result after recovery, 65 

unimplemented LDTW exception, 266, 505 

unimplemented STTW exception, 353, 505 

uniprocessor system, 19 

unpark, 19 

unparking CMP core, 544 

unrestricted, 19 

unrestricted ASI, 421 

unsigned integer data type, 35 

user application program, 19 

user trap handler, 65, 389 


V 
VA, 19 
VA watchpoint exception, 505 
VA WATCHPOINT register, 505 
value clipping, See FPACK instructions 
value semantics of input/output (I/O) 
locations, 400 
VER (version) register (SPARC V9), 109 
virtual 
address, 400 
address 0, 420 
virtual address, 19 
virtual core, 19 
virtual memory, 301 
virtual-translating ASI, 422 
VIS, 19 
VIS instructions 
encoding, 583, 584 
implicitly referencing GSR register, 80 
Visual Instruction Set, See VIS instructions 


W 

W-A-R, See write-after-read memory hazard 

warm reset (WMR), 565 
and STRAND ENABLE register, 543 
enabling/disabling virtual processors, 541, 542 
machine state changes, 566 

watchdog reset (POR) 
and RED state, 459 

watchdog reset (WDR), 505 
entering error state, 460 
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exiting error_state, 566, 600 
full-processor reset, 550 
invoking RED_state trap processing, 490 
per-strand reset, 550 
and XIR traps, 494 
watchdog reset (WDR), and guest watchdog, 455 
watchdog reset (WMR), 566 
watchpoint comparator, 98 
W-A-W, See write-after-write memory hazard 
WDR, 19 
WDR (watchdog reset), 566 
WDR, See watchdog reset (WDR) 
WIM register (SPARC V8), 378 
window fill exception, See also fill n normal 
exception 
window fill trap handler, 32 
window overflow, 53, 507 
window spill exception, See also spill n normal 
exception 
window spill trap handler, 32 
window state register, See WSTATE register 
window underflow, 507 
window, clean, 317 
window fill exception, 89, 131, 473 
RETURN, 315 
window spill exception, 89, 473 
WMR (warm reset), 565 
machine state changes, 566 
word, 19 
alignment, 28, 116, 403 
data format, 35 
WRASI instruction, 70, 74, 376 
WkRasr instruction, 376 
accessing I/O registers, 29 
attempt to write to ASR 5 (PC), 76 
cannot write to PC register, 76 
implementation dependencies, 590 
writing ASRs, 70 
WRCCR instruction, 70, 72, 73, 376 
WREPRS instruction, 71, 77, 376 
WRGSR instruction, 71, 80, 376 
WRHPR instruction, 104, 105, 107, 380 
WRIER instruction (SPARC V8), 378 
write ancillary state register (WRasr) 
instructions, 376 
write ancillary state register instructions, See WRasr 
instruction 
write hyperprivileged register instruction, 380 
write privileged register instruction, 382 





write-after-read memory hazard, 408 
write-after-write memory hazard, 408 
WRPCR instruction, 71, 376 
WRPIC instruction, 71, 376, 504 
WRPR instruction, 458 

accessing non-register-window PR state 

registers, 91 

accessing register-window PR state registers, 86 

and register-window PR state registers, 86 

effect on TNPC register, 92 

effect on TPC register, 92 

effect on TSTATE register, 94 

effect on TT register, 95 

writing the TICK register, 75 

writing to GL register, 103 

writing to PSTATE register, 95 

writing to TICK register, 75 
WRPSR instruction (SPARC V8), 378 
WRSOFTINT instruction, 71, 81, 376 
WRSOFTINT. CLR instruction, 71, 81, 83, 376, 513 
WRSOFTINT. SET instruction, 71, 81, 82, 376, 512 
WRSTICK instruction, 71, 84, 376 
WRSTICK_CMPR instruction, 71, 376 
WRTBR instruction (SPARC V8), 378 
WRTICK_CMP instruction, 71, 376 
WRWIM instruction (SPARC V8), 378 
WRY instruction, 70, 72, 376 
WSTATE (window state) register 

description, 89 

and fill/spill exceptions, 508 

normal field, 508 

other field, 508 

overview, 86 

reading with RDPR instruction, 307 

spill exception, 192 

spill trap, 318 

state after reset, 568 

writing with WRPR instruction, 382 


X 
XIR, 20 
XIR (externally initiated reset), 565 
XIR reset, 551 
XIR, See externally initiated reset (XIR) 
XIR. STEERING register, 552 
state after reset, 570 
XNOR instruction, 385 
XNORcc instruction, 385 
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XOR instruction, 385 
XORcc instruction, 385 


Y 

Y register, 70, 72 
after multiplication completed, 285 
content after divide operation, 321, 372 
divide operation, 321, 372 
multiplication, 285 
state after reset, 567 
unsigned multiply results, 329, 374 
WRY instruction, 377 

Y register (deprecated), 72 


Z 


zero virtual address, 420 
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